planet.freedesktop.org
September 10, 2018

All Systems Go! 2018 Tickets Selling Out Quickly!

Buy your tickets for All Systems Go! 2018 soon, they are quickly selling out! The conference takes place on September 28-30, in Berlin, Germany, in a bit over two weeks.

Why should you attend? If you are interested in low-level Linux userspace, then All Systems Go! is the right conference for you. It covers all topics relevant to foundational open-source Linux technologies. For details on the covered topics see our schedule for day #1 and for day #2.

For more information please visit our conference website!

See you in Berlin!

September 09, 2018

Last month KDE Akademy was held in Vienna. It was the first Akademy I visited and there wasn’t yet time to write a bit about the impression I got from it, judging what was nice and what could be improved from the point of view of someone new to it. Time to catch up on that.

Akademy came at a bad point in time for me. I was right in the middle of writing code for a larger feature in KWin’s Wayland session: drag-and-drop support between Wayland native and Xwayland windows. When I began the work on this feature back in July I hoped that I could finish it until Akademy. Not being able to do so felt demotivating, but I have to admit my plan was way too optimistic anyways. Only now, several weeks after Akademy, I feel comfortable enough about my code to use it on my work system without constant anxiety for fatal session crashes. But anyway, I went to my first Akademy with a bit less enthusiasm, as I otherwise probably would have shown. On the other side this gives me maybe also a more neutral take on it.

Akademy is basically split into two phases: the talks at the beginning on Saturday and Sunday and the BoFs for the rest of the time from Monday till Friday.

The talks

This is basically what you expect from a conference. Talks by people involved in the community or friends from the outside about different topics related to KDE. And these topics were very different indeed, what shows how large the KDE community and its reach really is. Often KDE is identified with the desktop environment Plasma alone, but there is much more to it. Having this kind of diversity in topics is a great plus to Akademy.

Still one has to say that Plasma is KDE’s flagship offering and by that deserves a central spot in the lineup. So judging from a Plasma dev point of view was this the case at this Akademy? That’s difficult to judge. There were interesting longer talks by David and Nate about the core Plasma Desktop experience, David’s talk technical, Nate’s talk on a broader more visionary note they both presented a path forward for Plasma. There were also talks about Plasma in other contexts, for example by Bhushan about Plasma on Mobile.

So judging by quantity there were enough Plasma talks I would say. Also I can’t really complain, since I personally did not contribute to increasing the quantity by submitting a talk proposal myself. The reason was just that on my first Akademy I wanted to be a listener only. So let us say quantity was fine and quality from what I can tell as a listener as well.

Still there was a distinct feel of too few, too disconnected, and I believe especially too disconnected describes it correctly, not only in regards to Plasma. The talks were spread out over two days in two different sized rooms somewhat randomly. There was no noticeable order to them or a grand arc connecting them besides being somewhat KDE related. One could argue this is a direct result of the KDE community being so diverse in interests and topics. But I believe with good planning by some key figures from the different parts of the community the talks could have been placed better and feel less disjointed. Also more relevant key notes would have helped. They could have provided for example an overarching focus point of the conference interconnecting the talks.

The BoFs

The second part of Akademy were the BoFs, what went for a full workweek. BoF means Birds of a feather and it denotes an informal discussion group about a specific topic. There were many BoFs, often in parallel, and I visited lots of them.

I won’t go into detail for all of them, but a general impression I got was that there is no clear definition of what a BoF should be and what should happen. What is kind of the point of a BoF but somewhat still often disappointing. In an extreme case a BoF I visited was more an internal discussion of a team of half a dozen people working closely together and left everybody just interested in learning more about the project on the outside since they did not know enough about the project’s internals yet. The BoF itself was not named as such and outsiders visiting out of interest were for sure disappointed. Other BoFs were more welcoming to newcomers, but still lacked a guiding structure or just someone moderating the discussion efficiently.

A plan to improve the BoF sessions could be the following: split them up into a mandatory common / introductory part and an optional team / current progress / special topic part, and advertise them as such to conference attendees. While both parts would still be open to anyone, the common part would be specifically suited for newcomers to the project while the team part can be used to discuss current topics in-depth. This at first sounds like it is double the work for project members, but the common part could be simply some reusable presentation slides explaining the core structure of the project together with a Q&A session. So I believe the additional amount of work is small. Also people would know what they get into and could plan their time efficiently. Besides one person from the team, which would probably most often be the maintainer or project lead, others could avoid the introductory part and newcomers could avoid the team part if they don’t yet feel knowledgeable enough to follow the discussion.

The organisation

In general I felt the conference was well organized for being done by volunteers, who were by the way super friendly and keen on helping and solving problems. There were a few issues ranging from non-functional printers to a way too small social event site, but they were so minor to not be worth delving into them more.

But there is one large issue that can not be ignored, and this is the availability and quality of videos form the conference. The videos have just been published recently and I think this is too late. They should be online not later than one week after Akademy.

And watching these videos is no fun at all. The talk recordings have a low resolution, filmed from way back in the room, and the voice quality often is abysmal. In comparison to last Akademy the vidoes already have improved, but they still lack the quality you would expect from an open tech community having the web as the main communication channel.

As an example what a good recording looks like take a look at this talk out of the chamber of darkness. I personally would take some of the Pineapple money and pay a team of professionals at next Akademy to record the talks in HD and upload them timely if we can not do it on our own. Enabling people who can not travel to Akademy to watch the presentations pleasantly and the additional exposure are always worth it.

The social and unofficial program

Of course technical topics are not everything about Akademy. The conference exists also to connect like-minded people and let otherwise semi-anonymous contributors meet each other in person, what often enables them to work better together on KDE projects in the future. Overall for me personally this was fine. I know most of the Plasma devs already and we just had a meeting in Berlin a few months ago. So there was not really that much need to talk in person again, although it helped for some current hot topics of course.

But there were some members of the VDG I was really looking forward to meet in person and the discussions we had were very fruitful. Also it was great to get to know Carlos from the GNOME project. I hope we can improve the collaboration of our communities in term of Wayland in the future. What I missed was talking more with some of the KDE Apps and Frameworks devs. I am not sure why this was. Maybe my personal area of work just does not intersect that much with apps developers at the moment.

Conclusion

This article was in some parts quite critical to Akademy, but that does not mean the conference was bad. Quite the contrary, I enjoyed meeting all the members and friends of the KDE community and I can recommend going there to anyone from user to maintainer of KDE software. It doesn’t matter what kind of KDE offering you use or contribute to, you will find sessions that interest you and people who share that same interest. Next Akademy is in less than a year and I look forward to meeting you there again or for the first time.

September 07, 2018

TWIV3D has been inactive for a bit, because I’ve been inactive for a bit. On August 15th, we added Moss Anholt to our family:

moss

So most of my time has been spent laying around with them and keeping us fed instead of writing software.

Here and there I’ve managed to write some patches. I’ve been working on meeting the current GLES3 CTS’s requirement for a 565 window system buffer being available. We don’t expose those on X11 today, because why would you? Your screen is 8888, so all you could do with 565 is render to a pixmap or buffer. Rendering to a pixmap is useless, becasue EGL doesn’t tell you whether R is in the top or bottom bits, so the X11 client doesn’t know how to interpret the rendered contents other than glReadPixels() (at which point, just use an FBO). My solution for this bad requirement from the CTS has been to add support for 565 pbuffers, which is pointless but is the minimum. This led to a patch to the xserver and a fix to V3D for blending against RTs without alpha channels.

I’m particularly proud of a small series to create a default case handler function for gallium pipe caps. Previously, every addition of a PIPE_CAP (which happens probably at least 10 times a year) has to touch every single driver, or that driver will start failing. It’s easy to forget a driver, particularly when rebasing a series across the addition of a new driver. And for driver authors, you’re faced with this list of 100 values which you don’t know the correct state of when starting to write a driver. The new helper lets CAP creators set a default value (usually whatever the hardcoded value was before their new CAP), and lets driver authors specify a much smaller set of required caps to get started.

Just before Moss arrived, I had landed some patches to VC4 to improve texture upload/download performance. Non-GL SDL apps on glamor were really slow, because the window wouldn’t be aligned to tiles and we’d download the tile-aligned area from the framebuffer to a raster-order temporary, store our new contents into it, then reupload that to the screen. The download from write-combined memory is brutally slow. My new code (mostly a rebase of patches I wrote in January 2017!) got the following results:

  • Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5)
  • Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11)
  • Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)

I also wrote a followup patch to avoid the copy from the user’s buffer into the temporary in the driver for texture uploads, for another 12.1586% +/- 1.38155% (n=145)

Other things since my last post:

  • Fixed VCM cache size setup and the V3D driver’s idea of the VPM size.
  • Fixed a leak of X11 pixmaps backing pbuffers on DRI3.
  • Fixed vc4_fence_server_sync() on pre-syncobj kernels.
  • Fixed vc4 uniform offsets after a sampler in a structure.
  • Improved vc4 uniform upload performance.
  • Fixed vc4 MSAA tile loads when both Z and color are needed.
  • Fixed vc4 rendering to cubemap EGL images.
  • Fixed vc4 leak of the no-vertex-elements workaround BO.
  • Fixed some v3d register spilling bugs.
September 04, 2018

libinput 1.12 was a massive development effort (over 300 patchsets) with a bunch of new features being merged. It'll be released next week or so, so it's worth taking a step back and looking at what actually changed.

The device quirks files replace the previously used hwdb-based udev properties. I've written about this in more detail here but the gist is: we have our own .ini style file format that can match on devices and apply the various quirks devices need. This simplifies debugging a lot, we can now reliably tell users why a quirks file applies or doesn't apply, historically a problem with the hwdb.

The sphinx-based documentation was merged, fixed and added to. We switched to sphinx for the docs and the result is much more user-friendly. Which was the point, it was a switch from a developer-oriented documentation to a user-oriented one. Not that documentation is ever finished.

The usual set of touchpad improvements went in, e.g. the slight motion on finger up is now ignored. We have size-based thumb detection now (useful for Apple touchpads!). And of course various quirks for better pressure ranges, etc. Tripletap on some synaptics touchpads had a tendency to cause multiple taps because of some weird event sequence. Movement in the software button now generates events, the buttons are not just a dead zone anymore. Pointer jump detection is more adaptive now and catches and discards smaller jumps that previously slipped through the cracks. A particularly quirky behaviour was seen on Dell XPS i2c touchpads that exhibit a huge pointer jump, courtesy of the trackpoint controller going to sleep and taking its time to wake up. The delay is still there but the pointer at least lands in the correct location.

We now have improved direction-locking for two-finger scrolling on touchpads. Scrolling up/down should not generate horizontal scroll events anymore as long as the movement is close enough to vertical. This feature is transparent, a diagonal or horizontal movement will immediately disable the direction lock and produce horizontal scroll events as expected.

The trackpoint acceleration has been re-done, see this post for more details and links to the background articles. I've only received one bug report for the new acceleration so it seems to work quite well now. Trackpoints that send events in bursts (e.g. bluetooth ones) are smoothened now to avoid jerky movement.

Velocity averaging was dropped to increase pointer accuracy. Previously we averaged the velocity across multiple events which makes the motion smoother on jittery devices but less accurate on good devices.

We build on FreeBSD now. Presumably this also means it works on FreeBSD :)

libinput now supports palm detection on touchscreens, at least where the ABS_MT_TOOL_TYPE evdev bit is provided.

I think that's about it. Busy days...

September 02, 2018

A 3D model of a cat with Phong shading

In January 2018, I began using my C201 full-time. In June 2018, its charging port broke, apparently due to physical damage, and I was forced to discontinue my school laptop and Mali T760 development machine.

Rest in peace, Veyron Speedy.


In its place, I purchased adopted Kevin, better known by its brand name, the Samsung Chromebook Plus (v1). Kevin contains a 64-bit Rockchip RK3399 processor, a substantial upgrade over the C201’s Rockchip RK3288, with a Mali T860 graphics processor. The T860, a new version of Mali Midgard in between its predecessor T760 and its successor Bifrost, was not supported in Panfrost, the community-led free software driver for modern Mali GPUs ubiquitous in phones and Chromebooks.

Meanwhile, back in May, TL Lim of Pine64 generously offered to send me a RK3399 development board, a ROCKPRO64. During my university days, the board arrived… and so work begun on T860 support.


The new hardware contains a number of challenges unseen in its predecessors. For instance, the RK3288 is a 32-bit processor (armv7) whereas the RK3399 is 64-bit (armv8). While Midgard is natively 64-bit, there are many memory-saving optimizations specifically for 32-bit machines like the RK3288. For instance, in the 32-bit command stream, the field encoding line width is stuffed in between a pointer at the end of the header and the beginning metadata of the tiler payload, in a spot that normally would contain padding to align everything. On a 64-bit machine, however, the preceding pointer is a full 8 bytes, and there is no room for the line width hack to function; this field is instead moved from the first to the last index in the data structure.

Another small change that took far too long to figure out was the modified kernel API for importing memory on 64-bit systems. Imports are necessary to setup the framebuffer correctly, avoiding expensive fullscreen blits. Once I read through the 64-bit portions of the kernel module, it was clear how to implement the new calls, but this change nevertheless slowed initial progress.

Another more challenging departure was the FBDs. From the kernel sources, we know there are two mysterious acronyms, SFBD and MFBD. We figured out these referred to the Framebuffer Descriptor. Why two? It turns out that older Midgard used the SFBD, and newer Midgard and Bifrost use the MFBD. The change coincided with the introduction of multi-render target (MRT) support in the newer chips. Accordingly, we are guessing these are the Single Framebuffer Descriptor (SFBD) and Multiple Framebuffer Descriptor (MFBD), though we can of course never be sure.

In any event, while the T760 supported MFBDs, it was possible to use a 32-bit SFBD instead. On the newer T860, due to the upgraded development hardware, we had no choice but to support the newer data structure. Luckily, Connor had already decoded the majority of the MFBD as part of his investigation in Bifrost. I was easily able to adapt his changes for the T860-family of processors, et voila, the command stream was making sense.

After decoding the new parts of the command stream, I implemented support in the Gallium driver for the new structures; as structures like the framebuffer descriptor are so fundamental, this implied significant changes. But after some fervent coding, the driver for T860 reached feature parity with Panfrost on the T760.

Turning our attention to shaders, it turns out there are no significant differences in the shader core between these two chips. That is, the minute I could replay a triangle on my laptop, we had free shaders. Woohoo!

Once the new hardware was setup, I turned my focus to cleaning up the codebase, implementing more of OpenGL ES 2.0, and debugging issues in the existing implementations. I squashed bugs (“Ow!”) ranging from incorrect stencil testing to broken colour masking in the absence of blending to missing movs in certain shaders. In terms of new features, I implemented the scissor test and added support for shaders with multiple varyings, a common pattern which did not show up in the simpler 3D tests. Multi-varying support required changes in both the command stream driver and the compiler, although we’re now able to understand why varyings are encoded in the (quirky) way they are.

I also (finally) fixed glReadPixels support, so we can start testing with dEQP and Piglit. The results remain somewhat depressing, but it’s a starting point for further driver debugging as we make the jump from proof-of-concept to production code.

Soup’s up, everypony!

(The soup is code.)


On the Bifrost side, Connor Abbott and Lyude Paul have been busy!

After decoding more of the Bifrost ISA, Connor turned his attention to the Bifrost command-stream, an iterative improvement on the Midgard command-stream. Building on my earlier work with the Midgard command stream, Connor identified analogue structures between the two versions, as well as marking off new structures including the aforementioned MFBD.

After decoding these initial structures, he implemented support in panwrap for generating replay-style traces of Bifrost command streams, again building on the infrastructure originally built for Midgard. Although he did not make direct use of the replay facilities in panwrap, these changes did enable a great deal of the Bifrost command stream to be elucidated.

He then went a step further, beyond OpenGL ES 2.0 vertex shaders and fragment shaders, venturing into the land of… compute shaders. Dun dun dun! (“What, no scary music?” “Sorry, Proselint says you already violated the 30ppm of exclamations rule. We are therefore forbidden from aiding your overexcited shenanigans.” “Fiiine. Do I at least get a singing entourage?” “Definitely not.”)

Incidentally, the initial investigations into compute shaders have shed light on vertex shaders, which are internally structured like compute jobs on both Bifrost and Midgard, with each vertex corresponding to a single invocation. However, Connor had a more immediate purpose in mind with his pursuit of compute shaders: gaining the ability to execute arbitrary shader binaries on a Bifrost board, implementing a “shader runner” to use the trochaic name. Midgard never needed this tool, as I have generally implemented the command stream parts of the driver in advance of the compiler, but Bifrost’s ISA decoding is considerably further along than its graphics pipeline. Through his earlier work with panwrap, he was successfully able to build such a tool, directly submitting simple compute jobs with offline precompiled shaders to the Bifrost GPU!

That does beg the question – where do these shader binaries come from, as work has not yet begun on the Bifrost compiler? As you may know, Lyude has been at work implementing a Bifrost assembler, a large task given the shear complexity of Bifrost, especially in comparison to Midgard – and a task made even harder without the ability to test on neither real hardware nor emulators. Paired with Connor’s shader runner, however, the assembler’s shader binaries can be tested, and her assembler is showing signs of life! Pairing these two components have enabled the first free shaders to run on Bifrost. While there are a few unimplemented opcodes remaining, the assembler is working well.

Returning to the shader running, beyond its immense value as a proof-of-concept, a side benefit of this instrumentation is enabling fine-grained ISA decoding. Whereas Midgard uses high-level arithmetic opcodes, with dedicated instructions for operations like a reciprocal-square root, Bifrost uses a minimalist architecture, dividing complex high-level operations into many small instructions. However, it can be difficult to discern the individual meaning of each micro opcode from simply studying disassembled compiled shaders. Nevertheless, with the new ability to assemble and run shaders, it is possible to test micro operations individually, determining their functions outside the high-level sequence. Through this technique, Connor has been able to understand the fine-grained details of operations like division on Bifrost.

All in all, our understanding of Bifrost is beginning to converge with that of Midgard. The next step, of course, will be to implement the Bifrost compiler and extend the Midgard driver to support Bifrost command streams. But hey, if someone had told me a year ago that we could run code natively, bloblessly, on virtually every Mali device, I don’t think I would have believed them. But between our work on Midgard and Bifrost, not to mention the Lima project’s work on Utgard, this dream has become a reality. In the words of Equestria Girls, what a strange new world…


But let’s not get ahead of ourselves. On the Mali T860 (Midgard), Freedreno’s test-cat and glmark2’s equivalent build test run successfully. The shaded cat screenshot at the beginning of the post is a native blobless render on the T860. Of course, the tests demoed on previous blog posts, like the shiny textured cube, continue to work on the new hardware.

Oh, and save for a funky viewport (corrected here) and some sporadic artefacts, it runs ’gears:

es2gears on the T860 with the blobless Panfrost stackes2gears on the T860 with the blobless Panfrost stack

Meow!

August 30, 2018

Intro slide

Downloads

If you're curious about the slides, you can download the PDF or the ODP.

Thanks

This post has been a part of work undertaken by my employer Collabora.

I would like to thank the wonderful organizers of OSSummit NA, the Linux Foundation for hosting a great event.

August 27, 2018

Window Scaling

One of the ideas we had in creating the compositing mechanism was to be able to scale window contents for the user -- having the window contents available as an image provides for lots of flexibility for presentation.

However, while we've seen things like “overview mode” (presenting all of the application windows scaled and tiled for easy selection), we haven't managed to interact with windows in scaled form. That is, until yesterday.

glxgears thinks the window is only 32x32 pixels in size. xfd is scaled by a factor of 2. xlogo is drawn at the normal size.

Two Window Sizes

The key idea for window scaling is to have the X server keep track of two different window sizes -- the sarea occupied by the window within its parent, and the area available for the window contents, including descendents. For now, at least, the origin of the window is the same between these two spaces, although I don't think there's any reason they would have to be.

  • Current Size. This is the size as seen from outside the window, and as viewed by all clients other than the owner of the window. It reflects the area within the parent occupied by the window, including the area which captures pointer events. This can probably use a better name.

  • Owner Size. This is the size of the window viewed from inside the window, and as viewed by the owner of the window. When composited, the composite pixmap gets allocated at this size. When automatically composited, the X server will scale the image of the window from this size to the current size.

Clip Lists

Normally, when computing the clip list for a composited window, the X server uses the current size of the window (aka the “borderSize” region) instead of just the porition of the window which is not clipped by the ancestor or sibling windows. This is how we capture output which is covered by those windows and can use it to generate translucent effects.

With an output size set, instead of using the current size, I use the owner size instead. All un-redirected descendents are thus clipped to this overall geometry.

Sub Windows

Descendent windows are left almost entirely alone; they keep their original geometry, both position and size. Because the output sized window retains its original position, all of the usual coordinate transformations 'just work'. Of course, the clipping computations will start with a scaled clip list for the output sized window, so the descendents will have different clipping. There's suprisingly little effect otherwise.

Output Handling

When an owner size is set, the window gets compositing enabled. The composite pixmap is allocate at the owner size instead of the current size. When no compositing manager is running, the automatic compositing painting code in the server now scales the output from the output size to the current size.

Most X applications don't have borders, but I needed to figure out what to do in case one appeared. I decided that the boarder should be the same size in the output and current presentations. That's about the only thing that I could get to make sense; the border is 'outside' the window size, so if you want to make the window contents twice as big, you want to make the window size twice as big, not some function of the border width.

About the only trick was getting the transformation from output size to current size correct in the presence of borders. That took a few iterations, but I finally just wrote down a few equations and solved for the necessary values. Note that Render transforms take destination space coordinates and generate source space coordinates, so they appear “backwards”. While Render supports projective transforms, this one is just scaling and translation, so we just need:

x_output_size = A * x_current_size + B

Now, we want the border width for input and output to be the same, which means:

border_width + output_size = A * (border_width + current_size) + B
border_width               = A * border_width                   + B

Now we can solve for A:

output_size = A * current_size
A = output_size / current_size

And for B:

border_width = output_size / current_size * border_width + B
B = (1 - output_size / current_size) * border_width

With these, we can construct a suitable transformation matrix:

⎡ Ax  0 Bx ⎤
⎢  0 Ay By ⎥
⎣  0  0  1 ⎦

Input Handling

Input device root coordinates need to be adjusted for owner sized windows. If you nest an owner sized window inside another owner sized window, then there are two transformations involved.

There are actually two places where these transformations need to be applied:

  1. To compute which window the pointer is in. If an output sized window has descendents, then the position of the pointer within the output window needs to be scaled so that the correct descendent is identified as containing the pointer.

  2. To compute the correct event coordinates when sending events to the window. I decided not to attempt to separate the window owner from other clients for event delivery; all clients see the same coordinates in events.

Both of these require the ability to transform the event coordinates relative to the root window. To do that, we translate from root coordinates to window coordinates, scale by the ratio of output to current size and then translate back:

void
OwnerScaleCoordinate(WindowPtr pWin, double *xd, double *yd)
{
    if (wOwnerSized(pWin)) {
        *xd = (*xd - pWin->drawable.x) * (double) wOwnerWidth(pWin) /
            (double) pWin->drawable.width + pWin->drawable.x;
        *yd = (*yd - pWin->drawable.y) * (double) wOwnerHeight(pWin) /
            (double) pWin->drawable.height + pWin->drawable.y;
    }
}

This moves the device to the scaled location within the output sized windows. Performing this transformation from the root window down to the target window adjusts the position correctly even when there is more than one output sized window among the window ancestry.

Case 1. is easy; XYToWindow, and the associated miSpriteTrace function, already traverse the window tree from the root for each event. Each time we descend through a window, we apply the transformation so that subsequent checks for descendents will check the correct coordinates. At each step, I use OwnerScaleCoordinate for the transformation.

Case 2. means taking an arbitrary window and walking up the window tree to the root and then performing each transformation on the way back down. Right now, I'm doing this recursively, but I'm reasonably sure it could be done iteratively instead:

void
ScaleRootCoordinate(WindowPtr pWin, double *xd, double *yd)
{
    if (pWin->parent)
        ScaleRootCoordinate(pWin->parent, xd, yd);
    OwnerScaleCoordinate(pWin, xd, yd);
}

Events and Replies

To make the illusion for the client work, everything the client hears about the window needs to be adjusted so that the window seems to be the owner size and not the current size.

  • Input events. The root coordinates are modified as described above, and then the window-relative coordinates are computed as usual—by subtracting the window origin from the root position. That's because the windows are all left in their original location.

  • ConfigureNotify events. These events are rewritten before being delivered to the owner so that the width and height reflect the owner size. Because window managers send synthetic configure notify events when moving windows, I also had to rewrite those events, or the client would get the wrong size information.

  • PresentConfigureNotify events. For these, I decided to rewrite the size values for all clients. As these are intended to be used to allocate window buffers for presentation, the right size is always the owner size.

  • OwnerWindowSizeNotify events. I created a new event so that the compositing manager could track the owner size of all child windows. That's necessary because the X server only performs the output size scaling operation for automatically redirected windows; if the window is manually redirected, then the compositing manager will have to perform the scaling operation instead.

  • GetGeometry replies. These are rewritten for the window owner to reflect the owner size value. Other clients see the current size instead.

  • GetImage replies. I haven't done this part yet, but I think I need to scale the window image for clients other than the owner. In particular, xwd currently fails with a Match error when it sees a window with a non-default visual that has an output size smaller than the window size. It tries to perform a GetImage operation using the current size, which fails when the server tries to fetch that rectangle from the owner-sized window pixmap.

Composite Extension Changes

I've stuck all of this stuff into the Composite extension; mostly because you need to use Composite to capture the scaled window output anyways.

12. Composite Events (0.5 and later)

Version 0.5 of the extension defines an event selection mechanism and a couple of events.

COMPOSITEEVENTTYPE {
    CompositePixmapNotify = 0
    CompositeOwnerWindowSizeNotify = 1
}

Event type delivered in events

COMPOSITEEVENTMASK {
    CompositePixmapNotifyMask = 0x0001
    CompositeOwnerWindowSizeNotifyMask = 0x0002
}

Event select mask for CompositeSelectInput

⎡
⎢    CompositeSelectInput
⎢
⎢                window:                                Window
⎢                enable                                SETofCOMPOSITEEVENTMASK
⎣

This request selects the set of events that will be delivered to the client from the specified window.

CompositePixmapNotify
    type:            CARD8          XGE event type (35)
    extension:       CARD8          Composite extension request number
    sequence-number: CARD16
    length:          CARD32         0
    evtype:          CARD16         CompositePixmapNotify
    window:          WINDOW
    windowWidth:     CARD16
    windowHeight:    CARD16
    pixmapWidth:     CARD16
    pixmapHeight:    CARD16

This event is delivered whenever the composite pixmap for a window is created, changed or deleted. When the composite pixmap is deleted, pixmapWidth and pixmapHeight will be zero. The client can call NameWindowPixmap to assign a resource ID for the new pixmap.

13. Output Window Size (0.5 and later)

⎡
⎢    CompositeSetOwnerWindowSize
⎢
⎢                window:                                Window
⎢                width:                                 CARD16
⎢                height:                                CARD16
⎣

This request specifies that the owner-visible window size will be set to the provided value, overriding the actual window size as seen by the owner. If composited, the composite pixmap will be created at this size. If automatically composited, the server will scale the output from the owner size to the current window size.

If the window is mapped, an UnmapWindow request is performed automatically first. Then the owner size is set. A CompositeOwnerWindowSizeNotify event is then generated. Finally, if the window was originally mapped, a MapWindow request is performed automatically.

Setting the width and height to zero will clear the owner size value and cause the window to resume normal behavior.

Input events will be scaled from the actual window size to the owner size for all clients.

A Match error is generated if:

  • The window is a root window
  • One, but not both, of width/height is zero

And, of course, you can retrieve the current size too:

⎡
⎢    CompositeGetOwnerWindowSize
⎢
⎢                window:                                Window
⎢
⎢                →
⎢
⎢                width:                                 CARD16
⎢                height:                                CARD16
⎣

This request returns the current owner window size, if set. Otherwise it returns 0,0, indicating that there is no owner window size set.

CompositeOwnerWindowSizeNotify
    type:            CARD8          XGE event type (35)
    extension:       CARD8          Composite extension request number
    sequence-number: CARD16
    length:          CARD32         0
    evtype:          CARD16         CompositeOwnerWindowSizeNotify
    window:          WINDOW
    windowWidth:     CARD16
    windowHeight:    CARD16
    ownerWidth:      CARD16
    ownerHeight:     CARD16

This event is generated whenever the owner size of the window is set. windowWidth and windowHeight report the current window size. ownerWidth and ownerHeight report the owner window size.

Git repositories

These changes are in various repositories at gitlab.freedesktop.org all using the “window-scaling” branch:

And here's a sample command line app which modifies the owner scaling value for an existing window:

Current Status

This stuff is all very new; I started writing code on Friday evening and got a simple test case working. I then spent Saturday making most of it work, and today finding a pile of additional cases that needed handling. I know that GetImage is broken; I'm sure lots of other stuff is also not quite right.

I'd love to get feedback on whether the API and feature set seem reasonable or not.

The DRM (direct rendering manager, not the content protection stuff) graphics subsystem in the linux kernel does not have a generic 2D accelaration API. Despite an awful lot of of GPUs having more or less featureful blitter units. And many systems need them for a lot of use-cases, because the 3D engine is a bit too slow or too power hungry for just rendering desktops.

It’s a FAQ why this doesn’t exist and why it won’t get added, so I figured I’ll answer this once and for all.

Bit of nomeclatura upfront: A 2D engine (or blitter) is a bit of hardware that can copy stuff with some knowledge of the 2D layout usually used for pixel buffers. Some blitters also can do more like basic blending, converting color spaces or stretching/scaling. A 3D engine on the other hand is the fancy bit of high performance compute block, which run small programs (called shaders) on a massively parallel archicture. Generally with huge memory bandwidth and a dedicated controller to feed this beast through an asynchronous command buffer. 3D engines happen to be really good at rendering the pixels for 3D action games, among other things.

There’s no 2D Acceleration Standard

3D has it easy: There’s OpenGL and Vulkan and DirectX that require a certain feature set. And huge market forces that make sure if you use these features like a game would, rendering is fast.

Aside: This means the 2D engine in a browser actually needs to work like a 3D action game, or the GPU will crawl. The impendence mismatch compared to traditional 2D rendering designs is huge.

On the 2D side there’s no such thing: Every blitter engine is its own bespoke thing, with its own features, limitations and performance characteristics. There’s also no standard benchmarks that would drive common performance characteristics - today blitters are neeeded mostly in small systems, with very specific use cases. Anything big enough to run more generic workloads will have a 3D rendering block anyway. These systems still have blitters, but mostly just to help move data in and out of VRAM for the 3D engine to consume.

Now the huge problem here is that you need to fill these gaps in various hardware 2D engines using CPU side software rendering. The crux with any 2D render design is that transferring buffers and data too often between the GPU and CPU will kill performance. Usually the cliff is so steep that pure CPU rendering using only software easily beats any simplistic 2D acceleration design.

The only way to fix this is to be really careful when moving data between the CPU and GPU for different rendering operations. Sticking to one side, even if it’s a bit slower, tends to be an overall win. But these decisions highly depend upon the exact features and performance characteristics of your 2D engine. Putting a generic abstraction layer in the middle of this stack, where it’s guaranteed to be if you make it a part of the kernel/userspace interface, will not result in actual accelaration.

So either you make your 2D rendering look like it’s a 3D game, using 3D interfaces like OpenGL or Vulkan. Or you need a software stack that’s bespoke to your use-case and the specific hardware you want to run on.

2D Accelaration is Really Hard

This is the primary reason really. If you don’t believe that, look at all the tricks a browser employs to render CSS and HTML and text really fast, while still animating all that stuff smoothly. Yes, a web-browser is the pinnacle of current 2D acceleration tech, and you really need all the things in there for decent performance: Scene graphs, clever render culling, massive batching and huge amounts of pains to make sure you don’t have to fall back to CPU based software rendering at the wrong point in a rendering pipeline. Plus managing all kinds of assorted caches to balance reuse against running out of memory.

Unfortunately lots of people assume 2D must be a lot simpler than 3D rendering, and therefore they can design a 2D API that’s fast enough for everyone. No one jumps in and suggests we’ll have a generic 3D interface at the kernel level, because the lessons there are very clear:

  • The real application interface is fairly high level, and in userspace.

  • There’s a huge industry group doing really hard work to specify these interfaces.

  • The actual kernel to userspace interfaces ends up being highly specific to the hardware and architecture of the userspace driver (which contains most of the magic). Any attempt at a generic interface leaves lots of hardware specific tricks and hence performance on the floor.

  • 3D APIs like OpenGL or Vulkan have all the batching and queueing and memory management issues covered in one way or another.

There are a bunch of DRM drivers which have a support for 2D render engines exposed to userspace. But they all use highly hardware specific interfaces, fully streamlined for the specific engine. And they all require a decently sized chunk of driver code in userspace to translate from a generic API to the hardware formats. This is what DRM maintainers will recommend you to do, if you submit a patch to add a generic 2D acceleration API.

Exactly like a 3D driver.

If All Else Fails, There’s Options

Now if you don’t care about the last bit of performance, and your use-case is limited, and your blitter engine is limited, then there’s already options:

You can take whatever pixel buffer you have, export it as a dma-buf, and then import it into some other subsystem which already has some kind of limited 2D accelaration support. Depending upon your blitter engine, a v4l2 mem2m device, or for simpler things there’s also dmaengines.

On top, the DRM subsystem does allow you to implement the traditional accelaration methods exposed by the fbdev subsystem. In case you have userspace that really insists on using these; it’s not recommended for anything new.

What about KMS?

The above is kinda a lie, since the KMS (kernel modesetting) IOCTL userspace API is a fairly full-featured 2D rendering interface. The aim of course is to render different pixel buffers onto a screen. With the recently added writeback support operations targetting memory are now possible. This could be used to expose a traditional blitter, if you only expose writeback support and no other outputs in your KMS driver.

There’s a few downsides:

  • KMS is highly geared for compositing just a few buffers (hardware usually has a very limited set of planes). For accelerated text rendering you want to do a composite operation for each character, which means this has rather limited use.

  • KMS only needs to run at 60Hz, or whatever the refresh rate of your monitor is. It’s not optimized for efficiency at higher throughput at all.

So all together this isn’t the high-speed 2D accelaration API you’re looking for either. It is a valid alternative to the options above though, e.g. instead of a v4l2 mem2m device.

FAQ for the FAQ, or: OpenVG?

OpenVG isn’t the standard you’re looking for either. For one it’s a userspace API, like OpenGL. All the same reasons for not implementing a generic OpenGL interface at the kernel/userspace apply to OpenVG, too.

Second, the Mesa3D userspace library did support OpenVG once. Didn’t gain traction, got canned. Just because it calls itself a standard doesn’t make it a widely adopted industry default. Unlike OpenGL/Vulkan/DirectX on the 3D side.

Thanks to Dave Airlie and Daniel Stone for reading and commenting on drafts of this text.

August 24, 2018
robertfoss@xps9570 ~/work/libdrm $ git ru
remote: Counting objects: 234, done.
remote: Compressing objects: 100% (233/233), done.
remote: Total 234 (delta 177), reused 0 (delta 0)
Receiving objects: 100% (234/234), 53.20 KiB | 939.00 KiB/s, done.
Resolving deltas: 100% (177/177), completed with 36 local objects.
From git://anongit.freedesktop.org/mesa/drm
   cb592ac8166e..bcb9d976cd91  master     -> upstream/master
 * [new tag]                   libdrm-2.4.93 -> libdrm-2.4.93
 * [new tag]                   libdrm-2.4.94 -> libdrm-2.4.94

The idea here is that we by issuing a single short command can fetch the latest master branch from the upstream repository of the codebase we're working on and set our local master branch to point to the most recent upstream/master one …

August 16, 2018

This is mostly a request for testing, because I've received zero feedback on the patches that I merged a month ago and libinput 1.12 is due to be out. No comments so far on the RC1 and RC2 either, so... well, maybe this gets a bit broader attention so we can address some things before the release. One can hope.

Required reading for this article: Observations on trackpoint input data and X server pointer acceleration analysis - part 5.

As the blog posts linked above explain, the trackpoint input data is difficult and largely arbitrary between different devices. The previous pointer acceleration libinput had relied on a fixed reporting rate which isn't true at low speeds, so the new acceleration method switches back to velocity-based acceleration. i.e. we convert the input deltas to a speed, then apply the acceleration curve on that. It's not speed, it's pressure, but it doesn't really matter unless you're a stickler for technicalities.

Because basically every trackpoint has different random data ranges not linked to anything easily measurable, libinput's device quirks now support a magic multiplier to scale the trackpoint range into something resembling a sane range. This is basically what we did before with the systemd POINTINGSTICK_CONST_ACCEL property except that we're handling this in libinput now (which is where acceleration is handled, so it kinda makes sense to move it here). There is no good conversion from the previous trackpoint range property to the new multiplier because the range didn't really have any relation to the physical input users expected.

So what does this mean for you? Test the libinput RCs or, better, libinput from master (because it's stable anyway), or from the Fedora COPR and check if the trackpoint works. If not, check the Trackpoint Configuration page and follow the instructions there.

August 13, 2018

GSoC Final Report


Nothing lasts forever, and this also applies for GSoC projects. In this report, I tried to summarize my experience in the DRI community and my contributions.

Recap the project idea

First, it is important to remember the main subject of my GSoC Project:

The Kernel Mode-Setting (KMS) is a mechanism that enables a process to command the kernel to set a mode (screen resolution, color depth, and rate) which is in a range of values supported by graphics cards and the display screen. Creating a Virtual KMS (VKMS) has benefits. First, it could be used for testing; second, it can be valuable for running X or Wayland on a headless machine enabling the use of GPU. This module is similar to VGEM, and in some ways to VIRTIO. At the moment that VKMS gets mature enough, it will be used to run i-g-t test cases and to automate userspace testing.

I heard about VKMS in the DRM TODO list and decided to apply for GSoC with this project. A very talented developer from Saudi Arabia named Haneen Mohammed had the same idea but applied to the Outreachy program. We worked together with the desire to push as hard as we can the Virtual KMS.

Overcome the steep learning curve

In my opinion, the main reason for the steep learning curve came from the lack of background experience in how the graphics stack works. For example, when I took operating system classes, I studied many things related to schedulers, memory and disk management, and so forth; on the other hand, I had a 10000-foot view of graphics systems. After long hours of studying and coding, I started to understand better how things work. It is incredible all the progress and advances that the DRI developers brought on the last few years! I wish that the new versions of the Operating system books have a whole chapter for this subject.

I still have problems to understand all the mechanisms available in the DRM; however, now I feel confident on how to read the code/documentation and get into the details of the DRM subsystem. I have plans to compile all the knowledge acquired during the project in a series of blog posts.

Contributions

During my work in the GSoC, I send my patches to the DRI mailing list and constantly got feedback to improve my work; as a result, I rework most of my patches. The natural and reliable way to track the contribution is by using “git log –author=”Rodrigo Siqueira” in one of the repositories below:

  • For DRM patches: git://anongit.freedesktop.org/drm-misc
  • For patches already applied to Torvalds branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  • For IGT patches: git://anongit.freedesktop.org/drm/igt-gpu-tools

In summary, follows the main patches that I got accepted:

  • drm/vkms: Fix connector leak at the module removal
  • drm/vkms: Add framebuffer and plane helpers
  • drm/vkms: Add vblank events simulated by hrtimers
  • drm/vkms: Add connectors helpers
  • drm/vkms: Add dumb operations
  • drm/vkms: Add extra information about vkms
  • drm/vkms: Add basic CRTC initialization
  • drm/vkms: Add mode_config initialization

We received two contributions from external people; I reviewed both patches:

  • drm/vkms: Use new return type vm_fault_t
  • drm/vkms: Fix the error handling in vkms_init()

I am using IGT to test VKMS, for this reason, I decided to send some contributions to them. I sent a series of patches for fixing GCC warning:

  • Fix comparison that always evaluates to false
  • Avoid truncate string in __igt_lsof_fds
  • Remove parameter aliases with another argument
  • Move declaration to the top of the code
  • Account for NULL character when using strncpy
  • Make string commands dynamic allocate (waiting for review)
  • Fix truncate string in the snprintf (waiting for review)

I also sent a patchset with the goal of adding support for forcing a specific module to be used by IGT tests:

  • Add support to force specific module load
  • Increase the string size for a module name (waiting for review)
  • Add support for forcing specific module (waiting for review)

As a miscellaneous contribution, I created a series of scripts to automate the workflow of Linux Kernel development. This small project was based on a series of scripts provided by my mentor, and I hope it can be useful for newcomers. Follows the project link:

  1. Kworkflow

Work in Progress

I am glad to say that I accomplished all the tasks initially proposed and I did much more. Now I am working to make VKMS work without vblank. This still a work in progress but I am confident that I can finish it soon. Finally, it is important to highlight that my GSoC participation will finish at the end of August because I traveled for two weeks to join the debconf2018.

Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning - Winston Churchill

GSoC gave me one thing that I was pursuing for a long time: a subsystem in the Linux Kernel that I can be focused for years. I am delighted that I found a place to be focused, and I will keep working on VKMS until It is finished.

Finally, the Brazilian government opened a call for encouraging free software development, and I decided to apply the VKMS project. Last week, I received the great news that I was selected in the first phase and now I am waiting for the final results. If everything ends well for me, I will receive funding to work for 5 months in the VKMS and DRM subsystem.

My huge thanks to…

I received support from many people in the dri-devel channel and mailing list. I want to thanks everybody for all the support and patience.

I want to thanks Daniel Vetter for all the feedback and assistance in the VKMS work. I also want to thanks Gustavo Padovan for all the support that he provided to me (which include some calls with great explanations about the DRM). Finally, I want to thanks Haneen for all the help and great work.

Reference

  1. Use new return type vm_fault_t
  2. Fix the error handling in vkms_init()
  3. Add support to force specific module load
August 09, 2018

libinput made a design decision early on to use physical reference points wherever possible. So your virtual buttons are X mm high/across, the pointer movement is calculated in mm, etc. Unfortunately this exposed us to a large range of devices that don't bother to provide that information or just give us the wrong information to begin with. Patching the kernel for every device is not feasible so in 2015 the 60-evdev.hwdb was born and it has seen steady updates since. Plenty a libinput bug was fixed by just correcting the device's axis ranges or resolution. To take the magic out of the 60-evdev.hwdb, here's a blog post for your perusal, appreciation or, failing that, shaking a fist at. Note that the below is caller-agnostic, it doesn't matter what userspace stack you use to process your input events.

There are four parts that come together to fix devices: a kernel ioctl and a trifecta of udev rules hwdb entries and a udev builtin.

The kernel's EVIOCSABS ioctl

It all starts with the kernel's struct input_absinfo.


struct input_absinfo {
__s32 value;
__s32 minimum;
__s32 maximum;
__s32 fuzz;
__s32 flat;
__s32 resolution;
};
The three values that matter right now: minimum, maximum and resolution. The "value" is just the most recent value on this axis, ignore fuzz/flat for now. The min/max values simply specify the range of values the device will give you, the resolution how many values per mm you get. Simple example: an x axis given at min 0, max 1000 at a resolution of 10 means your devices is 100mm wide. There is no requirement for min to be 0, btw, and there's no clipping in the kernel so you may get values outside min/max. Anyway, your average touchpad looks like this in evemu-record:

# Event type 3 (EV_ABS)
# Event code 0 (ABS_X)
# Value 2572
# Min 1024
# Max 5112
# Fuzz 0
# Flat 0
# Resolution 41
# Event code 1 (ABS_Y)
# Value 4697
# Min 2024
# Max 4832
# Fuzz 0
# Flat 0
# Resolution 37
This is the information returned by the EVIOCGABS ioctl (EVdev IOCtl Get ABS). It is usually run once on device init by any process handling evdev device nodes.

Because plenty of devices don't announce the correct ranges or resolution, the kernel provides the EVIOCSABS ioctl (EVdev IOCtl Set ABS). This allows overwriting the in-kernel struct with new values for min/max/fuzz/flat/resolution, processes that query the device later will get the updated ranges.

udev rules, hwdb and builtins

The kernel has no notification mechanism for updated axis ranges so the ioctl must be applied before any process opens the device. This effectively means it must be applied by a udev rule. udev rules are a bit limited in what they can do, so if we need to call an ioctl, we need to run a program. And while udev rules can do matching, the hwdb is easier to edit and maintain. So the pieces we have is: a hwdb that knows when to change (and the values), a udev program to apply the values and a udev rule to tie those two together.

In our case the rule is 60-evdev.rules. It checks the 60-evdev.hwdb for matching entries [1], then invokes the udev-builtin-keyboard if any matching entries are found. That builtin parses the udev properties assigned by the hwdb and converts them into EVIOCSABS ioctl calls. These three pieces need to agree on each other's formats - the udev rule and hwdb agree on the matches and the hwdb and the builtin agree on the property names and value format.

By itself, the hwdb itself has no specific format beyond this:


some-match-that-identifies-a-device
PROPERTY_NAME=value
OTHER_NAME=othervalue
But since we want to match for specific use-cases, our udev rule assembles several specific match lines. Have a look at 60-evdev.rules again, the last rule in there assembles a string in the form of "evdev:name:the device name:content of /sys/class/dmi/id/modalias". So your hwdb entry could look like this:

evdev:name:My Touchpad Name:dmi:*svnDellInc*
EVDEV_ABS_00=0:1:3
If the name matches and you're on a Dell system, the device gets the EVDEV_ABS_00 property assigned. The "evdev:" prefix in the match line is merely to distinguish from other match rules to avoid false positives. It can be anything, libinput unsurprisingly used "libinput:" for its properties.

The last part now is understanding what EVDEV_ABS_00 means. It's a fixed string with the axis number as hex number - 0x00 is ABS_X. And the values afterwards are simply min, max, resolution, fuzz, flat, in that order. So the above example would set min/max to 0:1 and resolution to 3 (not very useful, I admit).

Trailing bits can be skipped altogether and bits that don't need overriding can be skipped as well provided the colons are in place. So the common use-case of overriding a touchpad's x/y resolution looks like this:


evdev:name:My Touchpad Name:dmi:*svnDellInc*
EVDEV_ABS_00=::30
EVDEV_ABS_01=::20
EVDEV_ABS_35=::30
EVDEV_ABS_36=::20
0x00 and 0x01 are ABS_X and ABS_Y, so we're setting those to 30 units/mm and 20 units/mm, respectively. And if the device is multitouch capable we also need to set ABS_MT_POSITION_X and ABS_MT_POSITION_Y to the same resolution values. The min/max ranges for all axes are left as-is.

The most confusing part is usually: the hwdb uses a binary database that needs updating whenever the hwdb entries change. A call to systemd-hwdb update does that job.

So with all the pieces in place, let's see what happens when the kernel tells udev about the device:

  • The udev rule assembles a match and calls out to the hwdb,
  • The hwdb applies udev properties where applicable and returns success,
  • The udev rule calls the udev keyboard-builtin
  • The keyboard builtin parses the EVDEV_ABS_xx properties and issues an EVIOCSABS ioctl for each axis,
  • The kernel updates the in-kernel description of the device accordingly
  • The udev rule finishes and udev sends out the "device added" notification
  • The userspace process sees the "device added" and opens the device which now has corrected values
  • Celebratory champagne corks are popping everywhere, hands are shaken, shoulders are patted in congratulations of another device saved from the tyranny of wrong axis ranges/resolutions

Once you understand how the various bits fit together it should be quite easy to understand what happens. Then the remainder is just adding hwdb entries where necessary but the touchpad-edge-detector tool is useful for figuring those out.

[1] Not technically correct, the udev rule merely calls the hwdb builtin which searches through all hwdb entries. It doesn't matter which file the entries are in.

August 01, 2018

For some time now I been supporting two Linux developers on patreon. Namely Ryan Gordon of Linux game porting and SDL development fame and Tanu Kaskinen who is a lead developer on PulseAudio these days.

One of the things I often think about is how we can enable more people to make a living from working on the Linux desktop and related technologies. If your reading my blog there is a good chance that you are enabling people to make a living on working on the Linux desktop by paying for RHEL Workstation subscriptions through your work. So a big thank you for that. The fact that Red Hat has paying customers for our desktop products is critical in terms of our ability to do so much of the maintenance and development work we do around the Linux Desktop and Linux graphics stack.

That said I do feel we need more venues than just employment by companies such as Red Hat and this is where I would love to see more people supporting their favourite projects and developers through for instance Patreon. Because unlike one of funding campaigns repeat crowdfunding like Patreon can give developers predictable income, which means they don’t have to worry about how to pay their rent or how to feed their kids.

So in terms of the two Patreons I support Ryan is probably the closest to being able to rely on it for his livelihood, but of course more Patreon supporters will enable Ryan to be even less reliant on payments from game makers. And Tanu’s patreon income at the moment is helping him to spend quite a bit of time on PulseAudio, but it is definitely not providing him with a living income. So if you are reading this I strongly recommend that you support Ryan Gordon and Tanu Kaskinen on Patreon. You don’t need to pledge a lot, I think in general it is in fact better to have many people pledging 10 dollars a Month than a few pledging hundreds, because the impact of one person coming or going is thus a lot less. And of course this is not just limited to Ryan and Tanu, search around and see if any projects or developers you personally care deeply about are using crowdfunding and support them, because if more of us did so then more people would be able to make a living of developing our favourite open source software.

Update: Seems I wasn’t the only one thinking about this, Flatpak announced today that application devs can put their crowdfunding information into their flatpaks and it will be advertised in GNOME Software.

I'd had a bit of a break from adding feature to virgl while I was working on radv, but recently Google and Collabora have started to invest in virgl as a solution. A number of developers from both companies have joined the project.

This meant trying to get virgl to pass their dEQP suite and adding support for newer GL/GLES feature levels. They also have a goal for the renderer to run on a host GLES implementation whereas it currently only ran on a host GL.

Over the past few months I've worked with the group to add support for all the necessary features needed for guest GLES3.1 support (for them) and GL4.3 (for me).

The feature list was roughly:
tessellation shaders
fp64 support
ARB_gpu_shader5 support
Shader buffer objects
Shader image objects
Compute shaders
Copy Image
Texture views

With this list implemented we achieved GL4.3 and GLES3.1.

However Marek@AMD did some work on exposing ASTC for gallium drivers,
and with some extra work on EXT_shader_framebuffer_fetch, we now expose GLES3.2.

There was also plenty of work done on avoiding crashes from rogue guests (rewrote the whole feature/capability bit handling), and lots of bug fixes. There are still ongoing fixes to finish the dEQP tests, but it looks like all the feature work should be landed now.

What next?

Well there is one big problem facing virgl in exposing GL 4.4. GL_ARB_buffer_storage requires exposing coherent buffer memory, and with the virgl architecture, we currently don't have a way to map the pages behind a host GL buffer mapping into a guest GL buffer mapping in order to achieve coherency. This is going to require some thought and it may even require exposing some new GL extensions to export a buffer to a dma-buf.

There has also been a GSoC student Nathan working on vulkan/virgl support, he's made some iniital progess, however vulkan also has requirements on coherent memory so that tricky problems needs to be solved.

Thanks again to all the contributors to the virgl project.

To make testing libinput git master easier, I set up a whot/libinput-git Fedora COPR yesterday. This repo gets the push triggers directly from GitLab so it will rebuild with whatever is currently on git master.

To use the COPR, simply run:


sudo dnf copr enable whot/libinput-git
sudo dnf upgrade libinput
This will give you the libinput package from git. It'll have a date/time/git sha based NVR, e.g. libinput-1.11.901-201807310551git22faa97.fc28.x86_64. Easy to spot at least.

To revert back to the regular Fedora package run:


sudo dnf copr disable whot/libinput-git
sudo dnf distro-sync "libinput-*"

Disclaimer: This is an automated build so not every package is tested. I'm running git master exclusively (from a a ninja install) and I don't push to master unless the test suite succeeds. So the risk for ending up with a broken system is low.

On that note: if you are maintaining a similar repo for other distributions and would like me to add a push trigger in GitLab for automatic rebuilds, let me know.

July 31, 2018

Stack overview

Let's start with having a look at a high level overview of what the graphics stack looks like.

Alt text

Before digging too much further into this, lets cover some terminology.

DRM - Direct Rendering Manager - is the Linux kernel graphics subsystem, which contains all of the graphics drivers and does all of the interfacing with hardware.
The DRM subsystem implements the KMS - kernel mode setting - API.

Mode setting is essentially configuring output settings like resolution for the displays that are being used. And doing it using the kernel means that userspace doesn't need access to setting these things directly.

Alt text

The DRM subsystem talks to the hardware and Mesa is used by applications through the APIs it implements. APIs like OpenGL, OpenGL ES, Vulkan, etc. All …

July 30, 2018

libinput's documentation started out as doxygen of the developer API - they were the main target 4 years ago. Over time, more and more extra documentation was added and now most of it is aimed at users (for self-debugging and troubleshooting or just to explain concepts and features). Unfortunately, with doxygen this all ends up in the "Related Pages". The developer API documentation itself became a less important part, by now all the major compositors have libinput support and it doesn't change much. So while it needs to be there, most of the traffic goes to the user documentation (I think, it's not like I'm running stats).

Something more suited for prose-style docs was needed. I prefer the RTD look so last week I converted most of the libinput documentation into RST format and it's now built with sphinx and the RTD theme. Same URL as before: http://wayland.freedesktop.org/libinput/doc/latest/.

The biggest difference is that the Developer API Documentation (still doxygen) is now at http://wayland.freedesktop.org/libinput/doc/latest/api/, (i.e. add /api/ to the link). If you're programming against libinput's API (e.g. because you're writing a compositor), that's where you need to go.

It's still basically the same content as before, I'll be tidying things up and adding to it over the next few weeks. Hopefully without breaking existing links. There is probably detritus from the doxygen → rst change floating around, I'll be fixing that too. If you want to help out please don't hesitate, I'll do my best to be quick to review any merge requests.

I made some progress on GMP this week, getting jobs to run successfully if I enable regions in the GMP but never disable them. I need to do some more work on BO lifetime management if I want to clear disallowed regions back out of the GMP.

I did some more conformance debug, fixing one of the intermittent issues in GLES3 (The same rendering-versus-VS-texturing synchronization bug that broke glmark2 terrain on vc4). GLES3 conformance is now at GLES2 status plus one new set of intermittent issues. Another fix I made (not misusing HW semaphores) unfortunately doesn’t seem to have improved anything.

I had a few performance ideas left over from last week, and built them on Monday (fixed extra flushing due to glClear() after drawing, or the GFHX-1461 workaround, avoid storing buffers that had all their drawing masked out). Performance of simple apps was still astoundingly slow, so I did some more poking around and determined that my V3D HW is actually running at 37 Mhz. My other board runs at 27Mhz. Now I know why my conformance runs have been taking so long! Unfortunately, Broadcom STB doesn’t support clock control from Linux, so I won’t really be able to do performance work on this driver until I can get a SW stack under me that can set the right clock speeds.

Finally, I cleaned up and merged my CLIF dumping code, so V3D can now generate simple traces that can be replayed by the software simulator or on FPGAs to debug lockups. It’s going to take a bit more work to make it so that BO addresses in uniforms work, which is needed for texturing, UBOs, and spilling.

July 29, 2018

The All Systems Go! 2018 Call for Participation Closes TODAY!

The Call for Participation (CFP) for All Systems Go! 2018 will close TODAY, on 30th of July! We’d like to invite you to submit your proposals for consideration to the CFP submission site quickly!

ASG image

All Systems Go! is everybody's favourite low-level Userspace Linux conference, taking place in Berlin, Germany in September 28-30, 2018.

For more information please visit our conference website!

This is quite a long post. The executive summary is that freedesktop.org now hosts an instance of GitLab, which is generally available and now our preferred platform for hosting going forward. We think it offers a vastly better service, and we needed to do it in order to offer the projects we host the modern workflows they have been asking for.

In parallel, we’re working on making our governance, including policies, processes and decision making, much more transparent.

Some history

Founded by Havoc Pennington in 2000, freedesktop.org is now old enough to vote. From the initial development of the cross-desktop XDG specs, to supporting critical infrastructure such as NetworkManager, and now as the home to open-source graphics development (the kernel DRM tree, Mesa, Wayland, X.Org, and more), it’s long been a good home to a lot of good work.

We don’t provide day-to-day technical direction or enforce set rules: it’s a very loose collection of projects which we each trust to do their own thing, some with nothing in common but where they’re hosted.

Unfortunately, that hosting hasn’t really grown up a lot since the turn of the millennium. Our account system was forked (and subsequently heavily hacked) from Debian’s old LDAP-based system in 2004. Everyone needing direct Git commit access to projects, or the ability to upload to web space, has to file a bug in Bugzilla, where after a trip through the project maintainer, eventually an admin will get around to pulling their SSH and GPG (!) keys and adding an account by hand.

Similarly, creating or reconfiguring a Git repository also requires manual admin intervention, where on request one of us will SSH into the Git server and do whatever is required. Beyond Git and cgit for viewing, we provide Bugzilla for issue tracking, Mailman and Patchwork for code review and discussion, and ikiwiki for tracking. For our sins, we also have an FTP server running somewhere. None of these services are really integrated with each other; separate accounts and separate sets of permissions are required.

Maintaining these disparate services is a burden on both admins and projects. Projects are frequently blocked on admins adding users and changing their SSH keys, changing Git hooks, adding people to Patchwork, manually applying more duct tape to the integration between these services, and fixing the duct tape when it breaks (which is surprisingly often). As a volunteer admin for the service, doing these kinds of things is not exactly the reason we get out of bed in the morning; it also consumes so much time treading water that we haven’t been able to enable new features and workflows for the projects we host.

Seeking better workflows

As of writing, around one third of the non-dormant projects on fd.o have at some point migrated their development elsewhere; mostly to GitHub. Sometimes this was because the other sites were a more natural home (e.g. to sibling projects), and sometimes just because they offered a better workflow (integration between issue tracking and commits, web-based code review, etc). Other projects which would have found fd.o a natural home have gone straight to hosting externally, though they may use some of our services - particularly mailing lists.

Not everyone wants to make use of these features, and not everyone will. For example, the kernel might well never move away from email for issue tracking and code review. But the evidence shows us that many others do want to, and our platform will be a non-starter for them unless we provide the services they want.

A bit over three years ago, I set up an instance of Phabricator at Collabora to replace our mix of Bugzilla, Redmine, Trac, and JIRA. It was a great fit for how we worked internally, and upstream seemed like a good fit too; though they were laser-focused on their usecases, their extremely solid data storage and processing model made it quite easy to extend, and projects like MediaWiki, Haskell, LLVM and more were beginning to switch over to use it as their tracker. I set up an instance on fd.o, and we started to use it for a couple of trial projects: some issue tracking and code review for Wayland and Weston, development of PiTiVi, and so on.

The first point we seriously discussed it more widely was at XDC 2016 in Helsinki, where Eric Anholt gave a talk about our broken infrastructure, cleverly disguised as something about test suites. It became clear that we had wide interest in and support for better infrastructure, though with some reservation about particular workflows. There was quite a bit of hallway discussion afterwards, as Eric and Adam Jackson in particular tried out Phabricator and gave some really good feedback on its usability. At that point, it was clear that some fairly major UI changes were required to make it usable for our needs, especially for drive-by contributors and new users.

Last year, GNOME went through a similar process. With Carlos and some of the other members being more familiar with GitLab, myself and Emmanuele Bassi made the case for using Phabricator, based on our experiences with it at Collabora and Endless respectively. At the time, our view was that whilst GitLab’s code review was better, the issue tracking (being much like GitHub’s) would not really scale to our needs. This was mostly based on having last evaluated GitLab during the 8.x series; whilst the discussions were going on, GitLab were making giant strides in issue tracking throughout 9.x.

With GitLab coming up to par on issue tracking, both Emmanuele and I ended up fully supporting GNOME’s decision to base their infrastructure on GitLab. The UI changes required to Phabricator were not really tractable for the resources we had, the code review was and will always be fundamentally unsuitable being based around the Subversion-like model of reviewing large branches in one go, and upstream were also beginning to move to a much more closed community model.

gitlab.freedesktop.org

By contrast, one of the things which really impressed us about GitLab was how openly they worked, and how open they were to collaboration. Early on in GNOME’s journey to GitLab, they dropped their old CLA to replace it with a DCO, and Eliran Mesika from GitLab’s partnership team came to GUADEC to listen and understand how GNOME worked and what they needed from GitLab. Unfortunately this was too early in the process for us, but Robert McQueen later introduced us, and Eliran and I started talking about how they could help freedesktop.org.

One of our bigger issues was infrastructure. Not only were our services getting long in the tooth, but so were the machines they ran on. In order to stand up a large new service, we’d need new physical machines, but a fleet of new machines was beyond the admin time we had. It also didn’t solve issues such as everyone’s favourite: half of Europe can’t route to fd.o for half an hour most mornings due to obscure network issues with our host we’ve had no success diagnosing or fixing.

GitLab Inc. listened to our predicament and suggested a solution to help us: that they would sponsor our hosting on Google Cloud Platform for an initial period to get us on our feet. This involves us running the completely open-source GitLab Community Edition on infrastructure we control ourselves, whilst freeing us from having to worry about failing and full disks or creaking networks. (As with GNOME, we politely declined the offer of a license to the pay-for GitLab Enterprise Edition; we wanted to be fully in control of our infrastructure, and on a level playing field with the rest of the open-source community.)

They have also offered us support, from helping a cloud idiot understand how to deploy and maintain services on Kubernetes, to taking the time to listen and understand our workflows and improve GitLab for our uses. Much of the fruit of this is already visible in GitLab through feedback from us and GNOME, though there is always more to come. In particular, one area we’re looking at is integration with mailing lists and placing tags in commit messages, so developers used to mail-based workflows can continue to consume the firehose through email, rather than being required to use the web UI for everything.

Last Christmas, we gave ourselves the present of standing up gitlab.freedesktop.org on GCP, and set about gradually making it usable and maintainable for our projects. Our first hosted project was Panfrost, who were running on either non-free services or non-collaborative hosted services. We wanted to help them out by getting them on to fd.o, but they didn’t want to use the services we had at the time, and we didn’t want to add new projects to those services anyway.

Over time, as we stabilised the deployment and fleshed out the feature set, we added a few smaller projects, who understood the experimental nature and gave us space to make some mistakes, have some down time, and helped us smooth out the rough edges. Some of the blocker here was migrating bugs: though we reused GNOME’s bztogl script, we needed some adjustments for our different setups, as well as various bugfixes.

Not long ago, we migrated Mesa’s repository hosting as well as Wayland and Weston for both repository tracking and issue tracking which are our biggest projects to date.

What we offer to projects

With GitLab, we offer everything you would expect from gitlab.com (their hosted offering), or everything you would expect from GitHub with the usual external services such as Travis CI. This includes issue tracking integrated with repository management (close issues by pushing), merge requests with online review and merge, a comprehensive CI suite with shared runners available to all, custom sites built with whatever toolchain you like, external web hooks to integrate with other services, and a well-documented stable API which allows you to use external clients like git lab.

In theory, we’ve always provided most of the above services. Most of these - if you ignore the lack of integration between them - were more or less fine for projects running their own standalone infrastructure. But they didn’t scale to something like fd.o, where we have a very disparate family of projects sharing little in common, least of all common infrastructure and practices. For example, we did have a Jenkins deployment for a while, but it became very clear very early that this did not scale out to fd.o: it was impossible for us to empower projects to run their own CI without fatally compromising security.

Anyone familiar with the long wait for an admin to add an account or change an SSH key will be relieved to hear that this is no longer. Anyone can make an account on our GitLab instance using an email address and password, or with trusted external identity providers (currently Google, gitlab.com, GitHub, or Twitter) rather than having another username and password. We delegate permission management to project owners: if you want to give someone commit rights to your project, go right ahead. No need to wait for us.

We also support such incredible leading-edge security features as two-factor TOTP authentication for your account, Recaptcha to protect against spammers, and ways of deleting spam which don’t involve an admin sighing into a SQL console for half an hour, trying to not accidentally delete all the content.

Having an integrated CI system allows our projects to run test pipelines on merge requests, giving people fast feedback about any required changes without human intervention, and making sure distcheck works all the time, rather than just the week before release. We can capture and store logs, binaries and more as artifacts.

The same powerful system is also the engine for GitLab Pages: you can use static site generators like Jekyll and Hugo, or have a very spartan, hand-written site but also host auto-generated documentation. The choice is yours: running everything in (largely) isolated containers means that you can again do whatever you like with your own sites, without having to ask admins to set up some duct-taped triggers from Git repositories, then ask them to fix it when they’ve upgraded Python and everything has mysteriously stopped working.

Migration to GitLab, and legacy services

Now that we have a decent and battle-tested service to offer, we can look to what this means for our other services.

Phabricator will be decommissioned immediately; a read-only archive will be taken of public issues and code reviews and maintained as static pages forever, and a database dump will also be kept. But we do not plan to bring this service back, as all the projects using it have already migrated away from it.

Similarly, Jenkins has already been decommissioned and deactivated some time ago.

Whilst we are encouraging projects to migrate their issue tracking away from Bugzilla and helping those who do, we realise a lot of projects have built their workflows around Bugzilla. We will continue to maintain our Bugzilla installation and support existing projects with its use, though we are not offering Bugzilla to new projects anymore, and over the long term would like to see Bugzilla eventually retired.

Patchwork (already currently maintained by Intel for their KMS and Mesa work) is in the same boat, complicated by the fact that the kernel might never move away from patches carved into stone tablets.

Hopefully it goes without saying that our mailing lists are going to be long-lived, even if better issue tracking and code review does mean they’re a little less-trafficked than before.

Perhaps most importantly, we have anongit and cgit. anongit is not provided by GitLab, as they rightly prefer to serve repositories over https. Given that, for all existing projects we are maintaining anongit.fd.o as a read-only mirror of GitLab; there are far too many distributions, build scripts, and users out there with anongit URIs to discontinue the service. Over time we will encourage these downstreams to move to HTTPS to lessen the pressure, but this will continue to live for quite some time. Having cgit live alongside anongit is fairly painless, so we will keep it running whilst it isn’t a burden.

Lastly, annarchy.fd.o (aka people.fd.o) is currently offered as a general-purpose shell host. People use this to manage their Git repositories on people.fd.o and their files publicly served there. Since it is also the primary web host for most projects, both people and scripts use it to deploy files to sites. Some people use it for random personal file storage, to run various scripts and even as a personal IRC host. We are trying to transition these people away from using annarchy for this, as it is difficult for us to provide totally arbitrary resources to everyone who has at one point had an account with one of our member projects. Running a source of lots of IRC traffic is also a good way to make yourself deeply unpopular with many hosts.

Migrating your projects

After being iterated and fleshed out, we are happy to offer to migrate all the projects. For each project, we will ask you to file an issue using the migration template. This gives you a checklist with all the information we need to migrate your GitLab repositories, as well as your existing Bugzilla bugs.

Every user with a freedesktop.org SSH account already has an account created for them on GitLab, with access to the same groups. In order to recover access to the migrated accounts, you can request a password-reset link by entering the email address you signed up with into the ‘forgotten password’ box on the GitLab front page.

More information is available on the freedesktop GitLab wiki, and of course the admins are happy to help if you have any problems with this. The usual failure mode is that your email address has changed since you signed up: we’ve had one user who needed it changed as they were still using a Yahoo! mail address.

Governance and process

Away from technical issues, we’re also looking to inject a lot more transparency into our processes. For instance, why do we host kernel graphics development, but not new filesystems? What do we look for (both good and bad), and why is that? What is freedesktop.org even for, and who is it serving?

This has just been folk knowledge for some time; passed on by oral legend over IRC as verbal errata to out-of-date wiki pages. Just as with technical issues, this is not healthy for anyone: it’s more difficult for people to get involved and give us the help we so clearly need, it’s more difficult for our community to find out what they can expect from us and how we can help them, and it’s impossible for anyone to measure how good a job we’re actually doing.

One of the reasons we haven’t done a great job at this is because just keeping the Rube Goldberg machine of our infrastructure running exhausts basically all the time we have to deal with fd.o. The time we spend changing someone’s SSH keys by hand, or debugging a Git post-receive hook, is time we’re not spending on the care and feeding of our community.

We’ve spent the past couple of years paying down our technical debt, and the community equivalent thereof. Our infrastructure is much less error-prone than it was: we’ve gone from fighting fires to being able to prepare the new GitLab infrastructure and spend time shepherding projects through it. Now that we have a fair few projects on GitLab and they’ve been able to serve themselves, we’ve been able to take some time for community issues.

Writing down our processes is still a work in progress, but something we’ve made a little more headway on is governance. Currently fd.o’s governance is myself, Keith and Tollef discussing things and coming to some kind of conclusion. Sometimes that’s in recorded public fora, sometimes over email with searchable archives, sometimes just over IRC message or verbally with no public record of what happened.

Given that there’s a huge overlap between our mission and that of the X.Org Foundation (which is a lot more than just X11!), one idea we’re exploring is to bring fd.o under the Foundation’s oversight, with clear responsibility, accountability, and delegated roles. The combination of the two should give our community much more insight into what we’re doing and why - as well as, crucially, the chance to influence it.

Of course, this is all conditional on fd.o speaking to our member projects, and the Foundation speaking to its individual members, and getting wide agreement. There will be a lot of tuning required - not least, the Foundation’s bylaws would need a change which needs a formal vote from the membership - but this at least seems like a promising avenue.

July 26, 2018

A couple of Months ago we visited IKEA and saw the IKEA Trådfri smart lighting system. Since it was relatively cheap we decided to buy their starter pack and enough bulbs to make the recessed lights in our living rooms smartlight. I got it up and running and had some fun switching the light hue and turning the lights on and off from my phone.
A bit later I got a Google Assistant speaker at Google I/O and the system suddenly became somewhat more useful as I could control the lights by calling out to the google assistant. I was also able to connect our AC system thermostats to the google assistant so we could change the room temperature using voice commands.
As a result of this I ended up reading up on smart home technologies and found that my IKEA hub and bulbs was conforming to the ZigBee standard and that I should be able to buy further Zigbee compatible devices from other vendors to extend it. So I ordered a Zigbee compatible in-wall light switch from GE and I also ordered a Zigbee compatible in-ceiling switch from a company called Nue through Amazon. Once I got these home and tried to get them up and running I found that my understanding was flawed as there are two Zigbee standards, the older Zigbee HA and the newer Zigbee LightLink. The IKEA stuff is Zigbee LightLink while at least the GE switch was Zigbee HA and thus the IKEA hub could not control it. So I ended up ordering a Samsung SmartThings hub which supports Zigbee HA, Zigbee LL and the competing system called Z-Wave. At which point I got both my IKEA lights and my two devices working with it.
In the meantime I had also gotten myself a Google Assistant compatible portable aircon from FrigidAire for my home office (no part of main house) and a Google Assistant compatible fire alarm from Nest for the same office space.

So having lived in my new smarter home for a while now what are my conclusions? Well first of all that we still have some way to go before this is truly seamless and obvious. I consider myself a fairly technical person, but it still took quite a bit of googling for me to be able to get everything working. Secondly a lot of the smart home stuff feel a bit gimmicky in the end. For instance the Frigidaire portable air conditioner integration with the Google Assistant is more annoying than useful. It basically requires me to start by asking to talk to Frigidaire and then listen to a ton of crap before I can even start trying to do anything. As for the lights we do actually turn them on and off quite a bit using the voice commands (at least I try to until I realize my wife has disconnected the google assistant in order to use its cable to charge her phone :). I also realized that while installing and buying the in-wall switches are a bit more costly and complicated than just getting some smart bulbs it does work a lot better as I can then control the lights using both voice and switch. Because the smart bulbs can not be turned on using voice if you have a normal switch turned off (obviously, but not something I thought about before buying). So getting something like the IKEA bulbs is a nice and cheap way to try this stuff out I don’t see it as our long term solution here. The thermostat we haven’t controlled or queried once since I did the initial testing of its connection to Google assistant.

All in all I have to say that the smart home tech is cute, but it is far from being essential. I might end up putting in more wall switches for the light going forward, but apart from that, having smart home support in a device is not going to drive my purchasing decisions. Maybe as the tech matures and becomes more mainstream it will also become more useful, but as it stands it mostly solves first world problems (although of course there are real gains here for various accessibility situations).

July 25, 2018
foo

Recently, I submitted a talk about VKMS in a conference named Linux Developer Conference Brazil (linuxdev-br)[1] and got accepted [2]. I am delighted with this opportunity, since I attended linuxdev-br 2017 as a complete newbie and the conference provided me the required inputs to start working in the Kernel. Also, I had the opportunity to talk with great kernel developers and learn a little bit more with them. As a result, I have a lot of affection for this conference; presenting at linuxdev-br 2018 means a lot to me.

In the talk, I will explain how I started with Linux Kernel and got accepted in the GSoC 2018. Following that, I want to provide an overview of the DRM subsystem and finish with the explanation of the VKMS. I want to detail some of the features that I developed, and I will try to explain the features developed by Haneen. As soon as I have the presentation done, I will post it here.

Reference

  1. Linux Developer Conference Brazil
  2. Linuxdev-br speakers

Gather round children, it's story time. Especially for you children who lurk on /r/linux and think you may learn something there. Today, I'll tell you a horror story. The one where we convert kernel input events into touchpad events, with the subtle subtitle of "friends don't let friends handle evdev events".

The question put forward is "why do we need libinput at all", when, as frequently suggested on the usual websites, it's sufficient to just read evdev data and there's really no need for libinput. That is of course true. You can use evdev events from the kernel directly. Did you know that the events the kernel gives you are absolute coordinates? And that not all touchpads have buttons? Or that some touchpads have specific event sequences that need to be filtered? No? Well, boy, are you in for a few surprises! Anyway, let's go and handle evdev events ourselves and write our own libmyinput.

How do we know something is a touchpad? Well, we look at the exposed evdev bits. We need ABS_X, ABS_Y and BTN_TOOL_FINGER but don't want INPUT_PROP_DIRECT. If the latter bit is set then we have a touchscreen (probably). We don't actually care about buttons here, that comes later. ABS_X and ABS_Y give us device-absolute coordinates. On touch down you get the evdev frame of "a finger is down at x/y device units from the top-left". As you move around, you get the x/y coordinate updates. The data itself is exactly the same as you would get from a touchscreen, but we know it's a touchpad because we queried the other bits at startup. So your first job is to convert the absolute x/y coordinates to deltas by subtracting the previous position.

Touchpads have different resolutions for x and y so a delta of 10/10 does not mean it's a 45-degree movement. Better check with the resolution to convert this to physical distances to be on the safe side. Oh, btw, the axes aren't reliable. The min/max ranges and the resolutions are wrong on a large number of touchpads. Luckily systemd fixes this for you with the 60-evdev.hwdb. But I should probably note that hwdb only exists because of libinput... Either way, you don't have to care about it because the road's already paved. You're welcome.

Oh wait, you do have to care a little because there are touchpads (e.g. HP Stream 11, ZBook Studio G3, ...) where bits are missing or wrong. So you better write a device database that tells you when you have correct the evdev bits. You could implement this as config option but that's just saying "I know what's wrong here, I know how to fix it but I'm still going to make you google for it and edit a local configuration file to make it work". You could treat your users this way, but you really shouldn't.

As you're happily processing your deltas, you notice that on some touchpads you get motion before you touch the touchpad. Ooops, we need a way to tell whether a finger is down. Luckily the kernel gives you BTN_TOUCH for that event, so you switch your implementation to only calculate deltas when BTN_TOUCH is set. But then you realise that is effectively a hardcoded threshold in the kernel and does not match a lot of devices. Some devices require too-hard finger pressure to trigger BTN_TOUCH, others send it on super-light pressure or even while hovering. After grinding some enamel away you find that many touchpads give you ABS_PRESSURE. Awesome, let's make touches pressure-based instead. Let's use a threshold, no, I mean a device-specific threshold (because if two touchpads would be the same the universe will stop doing whatever a universe does, I clearly haven't thought this through). Luckily we already have the device database so we just add the thresholds there.

Oh, if you want this to run on a Apple touchpad better implement touch size handling (ABS_MT_TOUCH_MAJOR/ABS_MT_TOUCH_MINOR). These axes give you the size of the touching ellipse which is great. Except that the value is just an arbitrary number range that have no reflection to physical properties, so better update your database so you can add those thresholds.

Ok, now we have single-finger handling in our libnotinput. Let's add some sophisticated touchpad features like button clicks. Buttons are easy, the kernel gives us BTN_LEFT and BTN_RIGHT and, if you're lucky, BTN_MIDDLE. Unless you have a clickpad of course in which case you only ever get BTN_LEFT because the whole touchpad can be depressed (much like you, if you continue writing your own evdev handling). Those clickpads are in the majority of laptops these days, so we have to deal with them. The two approaches we have are "software button areas" and "clickfinger". The former detects where your finger is when you push the touchpad down - if it's in the bottom right corner we convert the kernel's BTN_LEFT to a BTN_RIGHT and pass that on. Decide how big the buttons will be (note: some touchpads that need software buttons are only 50mm high, others exceed 100mm height). Whatever size you choose, it's an invisible line on the touchpad. Do you know yet how you will handle a finger that moves from outside the button are into the button area before the click? Or the other way round? Maybe add this to your todo list for fixing later.

Maybe "clickfinger" is easier? It counts how many fingers are on the touchpad when clicking (1 finger == left click, 2 fingers == right click, 3 fingers == middle click). Much easier, except that so far we only handle one finger. The easy fix is to use BTN_TOOL_DOUBLETAP and BTN_TOOL_TRIPLETAP which are bitflags that tell you when a second/third finger are down. Add that to your libthisisnotlibinput. Coincidentally, users often click with their thumb while moving. So you have one finger moving the pointer, then a thumb click. Two fingers down but the user doesn't perceive it as such, this should be a left click. Oops, we don't actually know where the second finger is.

Let's switch our libstillnotlibinput to use ABS_MT_POSITION_X and ABS_MT_POSITION_Y because that gives us per-finger position information (once you understand how the kernel's MT protocol slots work). And when I say "switch" of course I meant "add" because there are still touchpads in use that don't support multitouch so you get to keep both implementations. There are also a bunch of touchpads that can give you the position of two fingers but not of the third. Wipe that tear away and pencil that into your todo list. I haven't mentioned semi-mt devices yet that will give you multitouch position data for two fingers but it won't track them correctly - the first touch position is always the top/left of the bounding box, the second touch is always the bottom/right of the bounding box. Do the right thing for our libwhathaveidone and just pretend semi-mt devices are single-touch touchpads. libinput (the real one) does the same because my sanity is something to be cherished.

Oh, on another note, some touchpads don't have any buttons (some Wacom tablets are large touchpads). Add that to your todo list. You wanted middle buttons to work? Few touchpads have a middle button (clickpads never do anyway). Better write a middle button emulation system that generates BTN_MIDDLE when both buttons are pressed. Or when a finger is on the left and another finger is on the right software button. Or when a finger is in a virtual middle button area. All these need to be present because if not, you get dissed by users for not implementing their favourite interaction method.

So we're several paragraphs in and so far we have: finger tracking and some button handling. And a bunch of things on the todo list. We haven't even started with other fancy features like edge scrolling, two-finger scrolling, pinch/swipe gestures or thumb and palm detection. Oh, and you're not yet handling any other devices like graphics tablets which are a world of their own. If you think all the other features and devices are any less of a mess... well, an Austrian comedian once said (paraphrased): "optimism is just a fancy word for ignorance".

All this is just handling features that users have come to expect. Examples for non-features that you'll have to implement: on some Lenovo series (*50 and newer) you will get a pointer jump after a series of of events that only have pressure information. You'll have to detect and discard that jump. The HP Pavilion DM4 touchpad has random jumps in the slot data. Synaptics PS/2 touchpads may 'randomly' end touches and restart them on the next event frame 10ms later. If you don't handle that you'll get ghost taps. And so on and so forth.

So as you, happily or less so, continue writing your libthisismoreworkthanexpected you'll eventually come to realise that you're just reimplementing libinput. Congratulations or condolences, whichever applies.

libinput's raison d'etre is that it deals with all the mess above so that compositor authors can be blissfully unaware of all this. That's the reason why all the major/general-purpose compositors have switched to libinput. That's the reason most distributions now use libinput with the X server (through the xf86-input-libinput driver). libinput has made some design decisions that you may disagree with but honestly, that's life. Deal with it. It doesn't even do all I want and I wrote >90% of it. Suggesting that you can just handle evdev directly is like suggesting you can use GPS coordinates directly to navigate. Sure you can, but there's a reason why people instead use a Tom Tom or Google Maps.

July 23, 2018

Two more weeks of conformance debug, and I’m now down to 1 group of failures in GLES2 and 1 group in EGL, both of which I believe are bugs in the tests (regressions in recent CTS releases, which is why Intel hadn’t caught them already).

Fixes have included:

  • Various bugs in fence handling (one of which caused throttling of frames to fail, leading to unlimited userspace BO cache usage and 2 days of debug flailing to track down).
  • Allow EGLImages to be created of textures/renderbuffers that we can’t do EGL_MESA_image_dma_buf_export on.
  • Fixed EGLImage import for cubemap images.
  • Landed the UIF format modifier support.
  • Fixed uniform lowering to not cause absurd register spilling.
  • Implemented more of the GFXH-1461 workaround to fix missing Z/S clears
  • Implemented a hack in gallium’s blitter to make the awful dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* tests pass.
  • Fixed incorrect reallocation of PERSISTENT buffers (also a vc4 bug!)
  • Fixed 1D_ARRAY texture upload/download.
  • Fixed drawing to MRT without independent blending
  • Fixed MSAA mapping bugs in gallium’s transfer helper.

I also spent a day knocking out some quick performance improvements:

  • Rotate through the registers to allow the backend scheduler to pair up QPU instructions more frequently.
  • Allow reading from the phys regfile in the following instruction.
  • Use the SFU instructions instead of magic writes on V3D 4.x.
  • Skip texture parameter P2 if it’s just the default value.
  • Start using small immediates (especially useful for VS input/outputs on 4.x).
  • Fix extra scene flush when doing the GFXH-1461 workaround.

After this I should probably work on the GMP support again, then get back to fixing GLES3.

July 22, 2018

The All Systems Go! 2018 Call for Participation Closes in One Week!

The Call for Participation (CFP) for All Systems Go! 2018 will close in one week, on 30th of July! We’d like to invite you to submit your proposals for consideration to the CFP submission site quickly!

ASG image

Notification of acceptance and non-acceptance will go out within 7 days of the closing of the CFP.

All topics relevant to foundational open-source Linux technologies are welcome. In particular, however, we are looking for proposals including, but not limited to, the following topics:

  • Low-level container executors and infrastructure
  • IoT and embedded OS infrastructure
  • BPF and eBPF filtering
  • OS, container, IoT image delivery and updating
  • Building Linux devices and applications
  • Low-level desktop technologies
  • Networking
  • System and service management
  • Tracing and performance measuring
  • IPC and RPC systems
  • Security and Sandboxing

While our focus is definitely more on the user-space side of things, talks about kernel projects are welcome, as long as they have a clear and direct relevance for user-space.

For more information please visit our conference website!

July 18, 2018

Battery life
I was very happy to see that Fedora Workstation 28 in the Phoronix benchmark got the best power consumption on a Dell XPS 13. Improving battery life has been a priority for us and Hans de Goede has been doing some incredible work so far. And best of all; more is to come :). So if you want great battery life with Linux on your laptop then be sure to be running Fedora on your laptop! On that note and to state the obvious, be aware that Fedora Workstation adoption rates are actually a major metric for us to decide where to put our efforts, so if we see good growth in Fedora due to people enjoying the improved battery life it enables us to keep investing in improving battery life, if we don’t see the growth we will need to conclude people don’t care that much and more our investment elsewhere.

Desktop remoting under Wayland
The team is also making great strides with desktop remoting under Wayland. In Fedora Workstation 29 we will have the VNC based GNOME Shell integrated desktop sharing working under Wayland thanks to the work done by Jonas Ådahl. It relies on PipeWire to share you Wayland session over VNC.
On a similar note Jan Grulich, Tomas Popela and Eike Rathke has been working on enabling Wayland desktop sharing through Firefox and Chromium. They are reporting good progress and actually did a video call between Firefox and Chromium last week, sharing their desktops with each other. This is enables by providing a PipeWire backend for both Firefox and Chromium. They are now working on cleaning up their patches and prepare them for submission upstream. We are also looking at providing a patched Firefox in Fedora Workstation 28 supporting this.

PipeWire
Wim Taymans talked about and demonstrated the latest improvements to PipeWire during GUADEC last week. He now got a libpulse.so drop in replacement that allows applications like Totem and Rhythmbox to play audio through PipeWire using the PulseAudio GStreamer plugin as Pipewire now provides a libpulse.so drop in replacement. Wim also keeps improving the Jack support in PipeWire by testing Jack applications one by one and fixing corner cases as he discovers them or they are reported by the Linux pro-audio community. We also ended up ordering Wim a Sony HT-Z9F soundbar for testing as we want to ensure PipeWire has great support for passthrough, be that SPDIF, HDMI or Bluetooth. The HT-Z9F also supports LDAC audio which is a new high quality audio format for Bluetooth and we want PipeWire to have full support for it.
To accelerate Pipewire development and adoption for audio we also decied to try to organize a PipeWire and Linux Audio hackfest this fall, with the goal of mapping our remaining issues and to try to bring the wider linux audio community together. So I am very happy that Arun Raghavan of PulseAudio fame agreed to be one of the co-organizer of this hackfest. Anyone interested in attending the PipeWire 2018 hackfest either add yourself to the attendee list or contact me (contact information can be found through the hackfest page) and I be happy to add you. The primary goal is to have developers from the PulseAudio and JACK communities attend alongside Wim Taymans and Bastien Nocera so we can make sure we got everything we need on the development roadmap and try to ensure we all pull in the same direction.

GNOME Builder
Christian Hergert did an update during GUADEC this year on GNOME Builder. As usual a ton of interesting stuff happening including new support for developing towards embedded devices like the upcoming Purism phone. Christian in his talk mentioned how Builder is probably the worlds first ‘Container Native IDE’ where it both is being developed with being packaged as a Flatpak in mind, but also developed with the aim of creating Flatpaks as its primary output. So a lot of effort is being put into both making sure it works well being inside a container itself, but also got all the bells and whistles for creating containers from your code. Another worthwhile point to mention is that Builder is also one of the best IDEs for doing Rust development in general!

Game mode in Fedora
Feral Interactive, one of the leading Linux game companies, released a tool they call gamemode for Linux not long ago. Since we want gamers to be first class citizens in Fedora Workstation we ended up going back and forth internally a bit about what to do about it, basically discussing if there was another way to resolve the problem even more seamlessly than gamemode. In the end we concluded that while the ideal solution would be to have the default CPU governor be able to deal with games better, we also realized that the technical challenge games posed to the CPU governor, by having a very uneven workload, is hard to resolve automatically and not something we have the resources currently to take a deep dive into. So in the end we decided that just packaging gamemode was the most reasonable way forward. So the package is lined up for the next batch update in Fedora 28 so you should soon be able to install it and for Fedora Workstation 29 we are looking at including it as part of the default install.

July 16, 2018

Add infrastructure for Vblank and page flip events in vkms simulated by hrtimer


Since the beginning of May 2018, I have been diving into the DRM subsystem. In the beginning, nothing made sense to me, and I had to fight hard to understand how things work. Fortunately, I was not alone, and I had great support from Gustavo Padovan, Daniel Vetter, Haneen Mohammed, and the entire community. Recently, I finally delivered a new feature for VKMS: the infrastructure for Vblank and page flip events.

At this moment, VKMS have regular Vblank events simulated through hrtimers (see drm-misc-next), which is a feature required by VKMS to mimic real hardware [6]. The development approach was entirely driven by the tests provided by IGT, more specifically the kms_flip. I modified IGT to read a module name via command line and force the use of it, instead of using only the modules defined in the code (patch submitted to IGT, see [1]). With this modification in the IGT, my development process to add a Vblank infrastructure to VKMS had three main steps as Figure 1 describes.

foo Figure 1: My work cycle in VKMS

Firstly, I focused only on the subtest “basic-plain-flip” from IGT and after each execution of the test I checked the failure messages. Secondly, I tried to write the required code to make the test pass; it is essential to highlight that this phase sometimes took me days to understand the problem and implement the fix. Finally, after I overcome the failure, I just put an additional effort to improve the implementation. As can be seen in the patchset send to add the Vblank support [2], the first set of patches was not directly related to the Vblank itself, but it was a necessary infrastructure required for kms_flip to work.

foo Figure 2: sudo ./tests/kms_flip --run-subtest basic-plain-flip --force-module vkms

After an extended period of work to make VKMS pass in the basic-plain-flip, I finally achieved it thanks to all the support that I received from the DRM community. Next, I started to work on the subtest “wf_vblank-ts-check”, and here I spent a lot of time debugging problems. My issue here was due to the stochastic test behavior, sometimes it passed and other, it fails, and I supposed the problem was related to the accumulation of errors during the page flip step. As a result, I put a considerable effort to make the timer in the page flip precise, I end up with a patch that calculates the exact moment for the next period (see [5]). Nevertheless, after I submitted the patch, Chris Wilson highlighted that I was reinventing the wheel since hrtimer already did the required calculations [4]; he was 100% right, after his comment I looked line by line of hrtimer_forward, and I concluded that I implemented the same algorithm. I lost some days recreating something that is not useful in the end; however, it was really valuable for me since I learned how hrtimer works and also expanded my comprehension about Vblank. Finally, Daniel Vetter precisely pointed out a series of problems in the patch (see [3]) that not only improved the tests, but also made most of the tests in the kms_flip pass.

In conclusion, adding the infrastructure for Vblank and page flip events in vkms was an exciting feature for VKMS, but also it was an important task to teach me how things work in the DRM. I am still focused on this part of the VKMS, but now, I am starting to think how can I add virtual hardware which does not support Vblank interrupt. Finally, I want to write a detailed blog post on how I implemented the Vblank support in the VKMS and another post about timers (users and kernel space); I believe this sort of post could be helpful for someone that is just starting in the DRM subsystem.

Thanks for all the DRM community that is always kind and provide great help for a newcomer like me :)

Reference

  1. Force option in IGT
  2. Adding infrastructure for Vblank and page flip events in vkms
  3. Daniel Vetter comments in V2
  4. Chris Wilson comments about hrtimer
  5. Calculating the period
  6. DRM misc-next
July 09, 2018

Now that my V3D GPU hangs are cleared up, I made a lot of progress on conformance:

Fixes from last week:

  • Don’t reset the GPU until it stops making progress on the job (instead of just a fixed timeout).
  • Fixed noperspective interpolation
  • Fixed leaks of default attributes and spill BOs.
  • Fixed ARB_color_buffer_float clamping of gl_FragData[]
  • Fixed support for GL_EXT_draw_buffers2’s independent blend enables.
  • Fixed GL_SAMPLE_ALPHA_TO_ONE.

Fail/(Pass+Fail) ratios after my weekend run:

  • EGL: 96/2143
  • GLES2: 7/16985
  • GLES3: 25/42549
  • GLES3 multisample: crash!

I’ve debugged the first MSAA crash, but there are some MSAA resolve bugs after that which I need to sort out (some of which are probably the cause of those EGL failures).

On the VC4 front, Boris landed the writeback changes, and I submitted the DT for it in my ARM pull request.

July 08, 2018

A common error when building from source is something like the error below:


meson.build:50:0: ERROR: Native dependency 'foo' not found
or a similar warning

meson.build:63:0: ERROR: Invalid version of dependency, need 'foo' ['>= 1.1.0'] found '1.0.0'.
Seeing that can be quite discouraging, but luckily, in many cases it's not too difficult to fix. As usual, there are many ways to get to a successful result, I'll describe what I consider the simplest.

What does it mean? Dependencies are simply libraries or tools that meson needs to build the project. Usually these are declared like this in meson.build:


dep_foo = dependency('foo', version: '>= 1.1.0')
In human words: "we need the development headers for library foo (or 'libfoo') of version 1.1.0 or later". meson uses the pkg-config tool in the background to resolve that request. If we require package foo, pkg-config searches for a file foo.pc in the following directories:
  • /usr/lib/pkgconfig,
  • /usr/lib64/pkgconfig,
  • /usr/share/pkgconfig,
  • /usr/local/lib/pkgconfig,
  • /usr/local/share/pkgconfig
The error message simply means pkg-config couldn't find the file and you need to install the matching package from your distribution or from source.

And important note here: in most cases, we need the development headers of said library, installing just the library itself is not sufficient. After all, we're trying to build against it, not merely run against it.

What package provides the foo.pc file?

In many cases the package is the development version of the package name. Try foo-devel (Fedora, RHEL, SuSE, ...) or foo-dev (Debian, Ubuntu, ...). yum and dnf provide a great shortcut to install any pkg-config dependency:


$> dnf install "pkgconfig(foo)"

$> yum install "pkgconfig(foo)"
will automatically search and install the right package, including its dependencies.
apt-get requires a bit more effort:

$> apt-get install apt-file
$> apt-file update
$> apt-file search --package-only foo.pc
foo-dev
$> apt-get install foo-dev
For those running Arch and pacman, the sequence is:

$> pacman -S pkgfile
$> pkgfile -u
$> pkgfile foo.pc
extra/foo
$> pacman -S extra/foo
Once that's done you can re-run meson and see if all dependencies have been met. If more packages are missing, follow the same process for the next file.

Any users of other distributions - let me know how to do this on yours and I'll update the post

My version is wrong!

It's not uncommon to see the following error after installing the right package:


meson.build:63:0: ERROR: Invalid version of dependency, need 'foo' ['>= 1.1.0'] found '1.0.0'.
Now you're stuck and you have a problem. What this means is that the package version your distribution provides is not new enough to build your software. This is where the simple solutions and and it all gets a bit more complicated - with more potential errors. Unless you are willing to go into the deep end, I recommend moving on and accepting that you can't have the newest bits on an older distribution. Because now you have to build the dependencies from source and that may then require to build their dependencies from source and before you know you've built 30 packages. If you're willing read on, otherwise - sorry, you won't be able to run your software today.

Manually installing dependencies

Now you're in the deep end, so be aware that you may see more complicated errors in the process. First of all you need to figure out where to get the source from. I'll now use cairo as example instead of foo so you see actual data. On rpm-based distributions like Fedora run dnf or yum:


$> dnf info cairo-devel # or yum info cairo-devel
Loaded plugins: auto-update-debuginfo, langpacks
Installed Packages
Name : cairo-devel
Arch : x86_64
Version : 1.13.1
Release : 0.1.git337ab1f.fc20
Size : 2.4 M
Repo : installed
From repo : fedora
Summary : Development files for cairo
URL : http://cairographics.org
License : LGPLv2 or MPLv1.1
Description : Cairo is a 2D graphics library designed to provide high-quality
: display and print output.
:
: This package contains libraries, header files and developer
: documentation needed for developing software which uses the cairo
: graphics library.
The important field here is the URL line - got to that and you'll find the source tarballs. That should be true for most projects but you may need to google for the package name and hope. Search for the tarball with the right version number and download it. On Debian and related distributions, cairo is provided by the libcairo2-dev package. Run apt-cache show on that package:

$> apt-cache show libcairo2-dev
Package: libcairo2-dev
Source: cairo
Version: 1.12.2-3
Installed-Size: 2766
Maintainer: Dave Beckett
Architecture: amd64
Provides: libcairo-dev
Depends: libcairo2 (= 1.12.2-3), libcairo-gobject2 (= 1.12.2-3),[...]
Suggests: libcairo2-doc
Description-en: Development files for the Cairo 2D graphics library
Cairo is a multi-platform library providing anti-aliased
vector-based rendering for multiple target backends.
.
This package contains the development libraries, header files needed by
programs that want to compile with Cairo.
Homepage: http://cairographics.org/
Description-md5: 07fe86d11452aa2efc887db335b46f58
Tag: devel::library, role::devel-lib, uitoolkit::gtk
Section: libdevel
Priority: optional
Filename: pool/main/c/cairo/libcairo2-dev_1.12.2-3_amd64.deb
Size: 1160286
MD5sum: e29852ae8e8e5510b00b13dbc201ce66
SHA1: 2ed3534d02c01b8d10b13748c3a02820d10962cf
SHA256: a6099cfbcc6bd891e347dd9abc57b7f137e0fd619deaff39606fd58f0cc60d27
In this case it's the Homepage line that matters, but the process of downloading tarballs is the same as above. For Arch users, the interesting line is URL as well:

$> pacman -Si cairo | grep URL
Repository : extra
Name : cairo
Version : 1.12.16-1
Description : Cairo vector graphics library
Architecture : x86_64
URL : http://cairographics.org/
Licenses : LGPL MPL
....

Now to the complicated bit: In most cases, you shouldn't install the new version over the system version because you may break other things. You're better off installing the dependency into a custom folder ("prefix") and point pkg-config to it. So let's say you downloaded the cairo tarball, now you need to run:


$> mkdir $HOME/dependencies/
$> tar xf cairo-someversion.tar.xz
$> cd cairo-someversion
$> autoreconf -ivf
$> ./configure --prefix=$HOME/dependencies
$> make && make install
$> export PKG_CONFIG_PATH=$HOME/dependencies/lib/pkgconfig:$HOME/dependencies/share/pkgconfig
# now go back to original project and run meson again
So you create a directory called dependencies and install cairo there. This will install cairo.pc as $HOME/dependencies/lib/cairo.pc. Now all you need to do is tell pkg-config that you want it to look there as well - so you set PKG_CONFIG_PATH. If you re-run meson in the original project, pkg-config will find the new version and meson should succeed. If you have multiple packages that all require a newer version, install them into the same path and you only need to set PKG_CONFIG_PATH once. Remember you need to set PKG_CONFIG_PATH in the same shell as you are running configure from.

In the case of dependencies that use meson, you replace autotools and make with meson and ninja:


$> mkdir $HOME/dependencies/
$> tar xf foo-someversion.tar.xz
$> cd foo-someversion
$> meson builddir -Dprefix=$HOME/dependencies
$> ninja -C builddir install
$> export PKG_CONFIG_PATH=$HOME/dependencies/lib/pkgconfig:$HOME/dependencies/share/pkgconfig
# now go back to original project and run meson again

If you keep seeing the version error the most common problem is that PKG_CONFIG_PATH isn't set in your shell, or doesn't point to the new cairo.pc file. A simple way to check is:


$> pkg-config --modversion cairo
1.13.1
Is the version number the one you installed or the system one? If it is the system one, you have a typo in PKG_CONFIG_PATH, just re-set it. If it still doesn't work do this:

$> cat $HOME/dependencies/lib/pkgconfig/cairo.pc
prefix=/usr
exec_prefix=/usr
libdir=/usr/lib64
includedir=/usr/include

Name: cairo
Description: Multi-platform 2D graphics library
Version: 1.13.1

Requires.private: gobject-2.0 glib-2.0 >= 2.14 [...]
Libs: -L${libdir} -lcairo
Libs.private: -lz -lz -lGL
Cflags: -I${includedir}/cairo
If the Version field matches what pkg-config returns, then you're set. If not, keep adjusting PKG_CONFIG_PATH until it works. There is a rare case where the Version field in the installed library doesn't match what the tarball said. That's a defective tarball and you should report this to the project, but don't worry, this hardly ever happens. In almost all cases, the cause is simply PKG_CONFIG_PATH not being set correctly. Keep trying :)

Let's assume you've managed to build the dependencies and want to run the newly built project. The only problem is: because you built against a newer library than the one on your system, you need to point it to use the new libraries.


$> export LD_LIBRARY_PATH=$HOME/dependencies/lib
and now you can, in the same shell, run your project.

Good luck!

July 02, 2018

For V3D, last week I spent mostly building up .CLIF dumping support so that I could talk to the HW team about the GPU hangs I’ve been experiencing. This involved reformatting my dump output, naming some missing (0-filled) fields of the GPU packets I emit, and restructuring the dumper so that I could dump the contents of each BO all at once (whereas before my dumps didn’t really care about BOs and would just dump each memory area as it was found in the worklist).

The final result was a CLIF file I sent to the HW team of a simple transform feedback test that was locking up the GPU. They reported that I was missing a wait packet that was required, beyond what the HW specs said. (I had actually earlier tried emitting the wait, but typoed the condition for enabling it!) With that, I think I’ve cleared up all of the GPU hangs in piglit runs of khr_gles3.

On the VC4 front, the DSI panel enable/disable sequencing patch is in, which should enable other people to successfully build DSI panel drivers for Raspberry Pi. Boris has resubmitted the VC4 writeback (TXP) series now that the core writeback support is in, and there were just a couple of cleanups and it should now be ready to land.

June 26, 2018

This post is part of a series: Part 1, Part 2, Part 3, Part 4, Part 5.

In this post I'll describe the X server pointer acceleration for trackpoints. You will need to read Observations on trackpoint input data first to make sense of this post.

As described in that linked post, trackpoint input data varies wildly. Combined with the options we have in the server to configure everything makes this post a bit pointless as almost every single behaviour can be changed.

The linked post also describes the three subjective pressure ranges: no real physical pressure, some physical pressure, and serious pressure. The line between the first two ranges is roughly where the trackpoint sends deltas at the maximum reporting rate (100Hz) but with a value of 1. Below that pressure, the intervals increase but the delta remains at 1. Above that pressure, the interval remains constant at 10ms but the deltas increase. I've used the default kernel trackpoint sensitivity of 128 for any data listed here. Here is the visualisation of how deltas and intervals change again.

The default pointer acceleration profile in the X server is the simple profile. We know this from the earlier posts, it has a double-plateau shape. On a trackpoint mm/s doesn't make sense here so let's look at it in units/ms instead. A unit is simply a device-specific measurement of distance/pressure/tilt/whatever - it all depends on the device. On trackpoints that is (mostly) sideways pressure or tilt. On mice and touchpads we can convert units to mm based on their resolution. On trackpoints, we don't have a physical reference and we thus have to deal with it in units. The obvious problem here is that 1 unit on one device does not equal 1 unit on another device. And for configurable trackpoints, the definition of a unit changes as the sensitivity changes. And that's after the kernel already mangles it (if it does, it doesn't for all devices). So here's a box of asterisks, please sprinkle it liberally.

The smallest delta the kernel can send is 1. At a hardware report rate of 100Hz, continuous pressure to the smallest detected threshold thus generates 1 unit every 10 milliseconds or 0.1 units/ms. If I push uncomfortably hard, I can get deltas of around 10 units every 10ms or 1 unit/ms. In other words, we better zoom in here. Let's look at the meaningful range of this curve.

On my trackpoint, below 0.1 units/ms means virtually no pressure (pressure range one). Pressure range two is 0.1 to 0.4, approximately. Beyond that is pressure range three but that is also the range that becomes pointless quickly - I simply wouldn't want to press this hard in normal operation. 1 unit per ms (10 units per report) is very high pressure. This means the pointer acceleration curve is actually defined for the usable range with only outliers hitting the maximum acceleration. For mice this curve was effectively a constant acceleration for all but slow movements (see here). However, any configuration can change this curve to a point where none of the above applies.

Back to the minimum constant movement of 0.1 units/ms. That one effectively matches the start of the 'no accel' plateau. Anything below that will be decelerated, i.e. a delta of 1 unit will result a pointer delta less than 1 pixel. In other words, anything up to where you have to apply real pressure is decelerated.

The constant factor plateau goes all the way to 0.4 units/ms. Then there's the buggy jump to a factor of ~1.5, followed by a smooth curve to 0.8 units/ms where the factor maxes out. A bit of testing here suggests that 0.4 units/ms is in the upper limits of the second pressure range mentioned above. Going past 0.6 or 0.7 is definitely well within the third pressure range where things get uncomfortable quickly. This means that the acceleration bug is actually sitting right in the highest interesting range. Apparently no-one has noticed for 10 years.

But what does it matter? Well, probably not even that much. The only interesting bit I I can see here is that we have deceleration for most low-pressure movements and a constant acceleration of 1 for most realistic movements. I very much doubt that the range above 0.4 really matters.

But hey, this is just the default configuration. It is affected when someone changes the speed slider in GNOME, or when someone changes the sensitivity at the sysfs level. Other trackpoints wont have the exact same behaviour. Any analysis is thrown out of the window as soon as someone changes the sysfs sensitivity or increases the acceleration threshold.

Let's talk sysfs - if we increase my trackpoint sensitivity to 200, the deltas coming from the trackpoint change. First, the pressure required to give me a constant stream of events often gives me deltas of size 2 or 3. So we're half-way into the no acceleration plateau here. Higher pressures easily give me deltas of size 10 or 1 unit per ms, the edge of the image above.

I wish I could analyse this any further but realistically, the only takeaway here is that any change in configuration options results in some version of trial-and-error by the user until the trackpoint moves as they want to. But without knowing all those options, we just cannot know what exactly is happening.

However, what this is useful for is comparing it to libinput. libinput got a custom trackpoint acceleration function in 1.8, designed around the hardware delta range. The idea was that you (or someone) measures the trackpoint device's range once, if it's outside of the assumed default ranges we add a hwdb entry and voila, it scales back to the right ranges and that device is fixed for good.

Except - this doesn't work. libinput scales into the delta range and calculates the factor from that but it doesn't take the time stamps into account. It works on the assumption that a trackpoint deltas are at a constant frequency with a varying delta. That is simply not the case and the dynamic range of the trackpoint is so small that any acceleration of the deltas results in jerky movement.

This is of course fixable, we can just convert the deltas into a speed and then apply the acceleration curve based on that. So that's the next task, if you're interested in that, subscribe yourself to this issue.

Portable Services with systemd v239

systemd v239 contains a great number of new features. One of them is first class support for Portable Services. In this blog story I'd like to shed some light on what they are and why they might be interesting for your application.

What are "Portable Services"?

The "Portable Service" concept takes inspiration from classic chroot() environments as well as container management and brings a number of their features to more regular system service management.

While the definition of what a "container" really is is hotly debated, I figure people can generally agree that the "container" concept primarily provides two major features:

  1. Resource bundling: a container generally brings its own file system tree along, bundling any shared libraries and other resources it might need along with the main service executables.

  2. Isolation and sand-boxing: a container operates in a name-spaced environment that is relatively detached from the host. Besides living in its own file system namespace it usually also has its own user database, process tree and so on. Access from the container to the host is limited with various security technologies.

Of these two concepts the first one is also what traditional UNIX chroot() environments are about.

Both resource bundling and isolation/sand-boxing are concepts systemd has implemented to varying degrees for a longer time. Specifically, RootDirectory= and RootImage= have been around for a long time, and so have been the various sand-boxing features systemd provides. The Portable Services concept builds on that, putting these features together in a new, integrated way to make them more accessible and usable.

OK, so what precisely is a "Portable Service"?

Much like a container image, a portable service on disk can be just a directory tree that contains service executables and all their dependencies, in a hierarchy resembling the normal Linux directory hierarchy. A portable service can also be a raw disk image, containing a file system containing such a tree (which can be mounted via a loop-back block device), or multiple file systems (in which case they need to follow the Discoverable Partitions Specification and be located within a GPT partition table). Regardless whether the portable service on disk is a simple directory tree or a raw disk image, let's call this concept the portable service image.

Such images can be generated with any tool typically used for the purpose of installing OSes inside some directory, for example dnf --installroot= or debootstrap. There are very few requirements made on these trees, except the following two:

  1. The tree should carry systemd unit files for relevant services in them.

  2. The tree should carry /usr/lib/os-release (or /etc/os-release) OS release information.

Of course, as you might notice, OS trees generated from any of today's big distributions generally qualify for these two requirements without any further modification, as pretty much all of them adopted /usr/lib/os-release and tend to ship their major services with systemd unit files.

A portable service image generated like this can be "attached" or "detached" from a host:

  1. "Attaching" an image to a host is done through the new portablectl attach command. This command dissects the image, reading the os-release information, and searching for unit files in them. It then copies relevant unit files out of the images and into /etc/systemd/system/. After that it augments any copied service unit files in two ways: a drop-in adding a RootDirectory= or RootImage= line is added in so that even though the unit files are now available on the host when started they run the referenced binaries from the image. It also symlinks in a second drop-in which is called a "profile", which is supposed to carry additional security settings to enforce on the attached services, to ensure the right amount of sand-boxing.

  2. "Detaching" an image from the host is done through portable detach. It reverses the steps above: the unit files copied out are removed again, and so are the two drop-in files generated for them.

While a portable service is attached its relevant unit files are made available on the host like any others: they will appear in systemctl list-unit-files, you can enable and disable them, you can start them and stop them. You can extend them with systemctl edit. You can introspect them. You can apply resource management to them like to any other service, and you can process their logs like any other service and so on. That's because they really are native systemd services, except that they have 'twist' if you so will: they have tougher security by default and store their resources in a root directory or image.

And that's already the essence of what Portable Services are.

A couple of interesting points:

  1. Even though the focus is on shipping service unit files in portable service images, you can actually ship timer units, socket units, target units, path units in portable services too. This means you can very naturally do time, socket and path based activation. It's also entirely fine to ship multiple service units in the same image, in case you have more complex applications.

  2. This concept introduces zero new metadata. Unit files are an existing concept, as are os-release files, and — in case you opt for raw disk images — GPT partition tables are already established too. This also means existing tools to generate images can be reused for building portable service images to a large degree as no completely new artifact types need to be generated.

  3. Because the Portable Service concepts introduces zero new metadata and just builds on existing security and resource bundling features of systemd it's implemented in a set of distinct tools, relatively disconnected from the rest of systemd. Specifically, the main user-facing command is portablectl, and the actual operations are implemented in systemd-portabled.service. If you so will, portable services are a true add-on to systemd, just making a specific work-flow nicer to use than with the basic operations systemd otherwise provides. Also note that systemd-portabled provides bus APIs accessible to any program that wants to interface with it, portablectl is just one tool that happens to be shipped along with systemd.

  4. Since Portable Services are a feature we only added very recently we wanted to keep some freedom to make changes still. Due to that we decided to install the portablectl command into /usr/lib/systemd/ for now, so that it does not appear in $PATH by default. This means, for now you have to invoke it with a full path: /usr/lib/systemd/portablectl. We expect to move it into /usr/bin/ very soon though, and make it a fully supported interface of systemd.

  5. You may wonder which unit files contained in a portable service image are the ones considered "relevant" and are actually copied out by the portablectl attach operation. Currently, this is derived from the image name. Let's say you have an image stored in a directory /var/lib/portables/foobar_4711/ (or alternatively in a raw image /var/lib/portables/foobar_4711.raw). In that case the unit files copied out match the pattern foobar*.service, foobar*.socket, foobar*.target, foobar*.path, foobar*.timer.

  6. The Portable Services concept does not define any specific method how images get on the deployment machines, that's entirely up to administrators. You can just scp them there, or wget them. You could even package them as RPMs and then deploy them with dnf if you feel adventurous.

  7. Portable service images can reside in any directory you like. However, if you place them in /var/lib/portables/ then portablectl will find them easily and can show you a list of images you can attach and suchlike.

  8. Attaching a portable service image can be done persistently, so that it remains attached on subsequent boots (which is the default), or it can be attached only until the next reboot, by passing --runtime to portablectl.

  9. Because portable service images are ultimately just regular OS images, it's natural and easy to build a single image that can be used in three different ways:

    1. It can be attached to any host as a portable service image.

    2. It can be booted as OS container, for example in a container manager like systemd-nspawn.

    3. It can be booted as host system, for example on bare metal or in a VM manager.

    Of course, to qualify for the latter two the image needs to contain more than just the service binaries, the os-release file and the unit files. To be bootable an OS container manager such as systemd-nspawn the image needs to contain an init system of some form, for example systemd. To be bootable on bare metal or as VM it also needs a boot loader of some form, for example systemd-boot.

Profiles

In the previous section the "profile" concept was briefly mentioned. Since they are a major feature of the Portable Services concept, they deserve some focus. A "profile" is ultimately just a pre-defined drop-in file for unit files that are attached to a host. They are supposed to mostly contain sand-boxing and security settings, but may actually contain any other settings, too. When a portable service is attached a suitable profile has to be selected. If none is selected explicitly, the default profile called default is used. systemd ships with four different profiles out of the box:

  1. The default profile provides a medium level of security. It contains settings to drop capabilities, enforce system call filters, restrict many kernel interfaces and mount various file systems read-only.

  2. The strict profile is similar to the default profile, but generally uses the most restrictive sand-boxing settings. For example networking is turned off and access to AF_NETLINK sockets is prohibited.

  3. The trusted profile is the least strict of them all. In fact it makes almost no restrictions at all. A service run with this profile has basically full access to the host system.

  4. The nonetwork profile is mostly identical to default, but also turns off network access.

Note that the profile is selected at the time the portable service image is attached, and it applies to all service files attached, in case multiple are shipped in the same image. Thus, the sand-boxing restriction to enforce are selected by the administrator attaching the image and not the image vendor.

Additional profiles can be defined easily by the administrator, if needed. We might also add additional profiles sooner or later to be shipped with systemd out of the box.

What's the use-case for this? If I have containers, why should I bother?

Portable Services are primarily intended to cover use-cases where code should more feel like "extensions" to the host system rather than live in disconnected, separate worlds. The profile concept is supposed to be tunable to the exact right amount of integration or isolation needed for an application.

In the container world the concept of "super-privileged containers" has been touted a lot, i.e. containers that run with full privileges. It's precisely that use-case that portable services are intended for: extensions to the host OS, that default to isolation, but can optionally get as much access to the host as needed, and can naturally take benefit of the full functionality of the host. The concept should hence be useful for all kinds of low-level system software that isn't shipped with the OS itself but needs varying degrees of integration with it. Besides servers and appliances this should be particularly interesting for IoT and embedded devices.

Because portable services are just a relatively small extension to the way system services are otherwise managed, they can be treated like regular service for almost all use-cases: they will appear along regular services in all tools that can introspect systemd unit data, and can be managed the same way when it comes to logging, resource management, runtime life-cycles and so on.

Portable services are a very generic concept. While the original use-case is OS extensions, it's of course entirely up to you and other users to use them in a suitable way of your choice.

Walkthrough

Let's have a look how this all can be used. We'll start with building a portable service image from scratch, before we attach, enable and start it on a host.

Building a Portable Service image

As mentioned, you can use any tool you like that can create OS trees or raw images for building Portable Service images, for example debootstrap or dnf --installroot=. For this example walkthrough run we'll use mkosi, which is ultimately just a fancy wrapper around dnf and debootstrap but makes a number of things particularly easy when repetitively building images from source trees.

I have pushed everything necessary to reproduce this walkthrough locally to a GitHub repository. Let's check it out:

$ git clone https://github.com/systemd/portable-walkthrough.git

Let's have a look in the repository:

  1. First of all, walkthroughd.c is the main source file of our little service. To keep things simple it's written in C, but it could be in any language of your choice. The daemon as implemented won't do much: it just starts up and waits for SIGTERM, at which point it will shut down. It's ultimately useless, but hopefully illustrates how this all fits together. The C code has no dependencies besides libc.

  2. walkthroughd.service is a systemd unit file that starts our little daemon. It's a simple service, hence the unit file is trivial.

  3. Makefile is a short make build script to build the daemon binary. It's pretty trivial, too: it just takes the C file and builds a binary from it. It can also install the daemon. It places the binary in /usr/local/lib/walkthroughd/walkthroughd (why not in /usr/local/bin? because it's not a user-facing binary but a system service binary), and its unit file in /usr/local/lib/systemd/walkthroughd.service. If you want to test the daemon on the host we can now simply run make and then ./walkthroughd in order to check everything works.

  4. mkosi.default is file that tells mkosi how to build the image. We opt for a Fedora-based image here (but we might as well have used Debian here, or any other supported distribution). We need no particular packages during runtime (after all we only depend on libc), but during the build phase we need gcc and make, hence these are the only packages we list in BuildPackages=.

  5. mkosi.build is a shell script that is invoked during mkosi's build logic. All it does is invoke make and make install to build and install our little daemon, and afterwards it extends the distribution-supplied /etc/os-release file with an additional field that describes our portable service a bit.

Let's now use this to build the portable service image. For that we use the mkosi tool. It's sufficient to invoke it without parameter to build the first image: it will automatically discover mkosi.default and mkosi.build which tells it what to do. (Note that if you work on a project like this for a longer time, mkosi -if is probably the better command to use, as it that speeds up building substantially by using an incremental build mode). mkosi will download the necessary RPMs, and put them all together. It will build our little daemon inside the image and after all that's done it will output the resulting image: walkthroughd_1.raw.

Because we opted to build a GPT raw disk image in mkosi.default this file is actually a raw disk image containing a GPT partition table. You can use fdisk -l walkthroughd_1.raw to enumerate the partition table. You can also use systemd-nspawn -i walkthroughd_1.raw to explore the image quickly if you need.

Using the Portable Service Image

Now that we have a portable service image, let's see how we can attach, enable and start the service included within it.

First, let's attach the image:

# /usr/lib/systemd/portablectl attach ./walkthroughd_1.raw
(Matching unit files with prefix 'walkthroughd'.)
Created directory /etc/systemd/system/walkthroughd.service.d.
Written /etc/systemd/system/walkthroughd.service.d/20-portable.conf.
Created symlink /etc/systemd/system/walkthroughd.service.d/10-profile.conf → /usr/lib/systemd/portable/profile/default/service.conf.
Copied /etc/systemd/system/walkthroughd.service.
Created symlink /etc/portables/walkthroughd_1.raw → /home/lennart/projects/portable-walkthrough/walkthroughd_1.raw.

The command will show you exactly what is has been doing: it just copied the main service file out, and added the two drop-ins, as expected.

Let's see if the unit is now available on the host, just like a regular unit, as promised:

# systemctl status walkthroughd.service
● walkthroughd.service - A simple example service
   Loaded: loaded (/etc/systemd/system/walkthroughd.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/walkthroughd.service.d
           └─10-profile.conf, 20-portable.conf
   Active: inactive (dead)

Nice, it worked. We see that the unit file is available and that systemd correctly discovered the two drop-ins. The unit is neither enabled nor started however. Yes, attaching a portable service image doesn't imply enabling nor starting. It just means the unit files contained in the image are made available to the host. It's up to the administrator to then enable them (so that they are automatically started when needed, for example at boot), and/or start them (in case they shall run right-away).

Let's now enable and start the service in one step:

# systemctl enable --now walkthroughd.service
Created symlink /etc/systemd/system/multi-user.target.wants/walkthroughd.service → /etc/systemd/system/walkthroughd.service.

Let's check if it's running:

# systemctl status walkthroughd.service
● walkthroughd.service - A simple example service
   Loaded: loaded (/etc/systemd/system/walkthroughd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/walkthroughd.service.d
           └─10-profile.conf, 20-portable.conf
   Active: active (running) since Wed 2018-06-27 17:55:30 CEST; 4s ago
 Main PID: 45003 (walkthroughd)
    Tasks: 1 (limit: 4915)
   Memory: 4.3M
   CGroup: /system.slice/walkthroughd.service
           └─45003 /usr/local/lib/walkthroughd/walkthroughd

Jun 27 17:55:30 sigma walkthroughd[45003]: Initializing.

Perfect! We can see that the service is now enabled and running. The daemon is running as PID 45003.

Now that we verified that all is good, let's stop, disable and detach the service again:

# systemctl disable --now walkthroughd.service
Removed /etc/systemd/system/multi-user.target.wants/walkthroughd.service.
# /usr/lib/systemd/portablectl detach ./walkthroughd_1.raw
Removed /etc/systemd/system/walkthroughd.service.
Removed /etc/systemd/system/walkthroughd.service.d/10-profile.conf.
Removed /etc/systemd/system/walkthroughd.service.d/20-portable.conf.
Removed /etc/systemd/system/walkthroughd.service.d.
Removed /etc/portables/walkthroughd_1.raw.

And finally, let's see that it's really gone:

# systemctl status walkthroughd
Unit walkthroughd.service could not be found.

Perfect! It worked!

I hope the above gets you started with Portable Services. If you have further questions, please contact our mailing list.

Further Reading

A more low-level document explaining details is shipped along with systemd.

There are also relevant manual pages: portablectl(1) and systemd-portabled(8).

For further information about mkosi see its homepage.

June 25, 2018

For V3D, last week included:

  • Created the DRM fourcc for talking about the new UIF tiling between processes.
  • Fixed offsets of buffers shared between processes.
  • Reduced CPU overhead and binary size of V3D and VC4 release builds.
  • Fixed min/mag determination in non-mipmapped texture filtering modes.
  • Implemented ALPHA_TO_COVERAGE.
  • Fixed flushing of jobs writing to separate stencil buffers.
  • Fixed the return status of ClientWaitSync.
  • Fixed blits from linear winsys BOs.
  • Wrote a workaround for broken transform feedback setup with gallium NIR.

For VC4, I fixed a regression in display initialization in drm-misc-next. I’ve also been doing some study of the HVS to help Boris with the T-format scanout offset patch. Hopefully with what I figured out today, he can get it all working. I also respun my DSI enable/disable sequencing patch to not need any changes in the core.

Finally, I put together the branches for bcm2835 maintainership for kernel 4.19. This week I should PR them.

June 24, 2018

Alt text

Configuring the QNAP device

The first step is to SSH into your QNAP device using the admin account.

ssh admin@NAS_IP
cat >> /share/Container/container-station-data/lib/lxc/CONTAINER_NAME/config << EOF
lxc.cgroup.devices.allow = c 10:200 rwm
EOF

Configuring the container guest

The second step is open the QNAP web-ui, open the Container Station application and enter the console of your container.

sed -i '/exit 0/d' /etc/rc.local
cat >> /etc/rc.local << EOF
if ! [ -c /dev/net/tun ]; then
    mkdir -p /dev/net
    mknod -m 666 /dev/net/tun c 10 200
fi

exit 0
EOF

Lastly the container needs to be restarted, and then your VPN application should be able to access TUN devices and work normally.

June 22, 2018
In March 1986, my dad was in the market for a Thomson TO7/70. I have the circled classified ads in “Téo” issue 1 to prove that :)



TO7/70 with its chiclet keyboard and optical pen, courtesy of MO5.com

The “Plan Informatique pour Tous” was in full swing, and Thomson were supplying schools with micro-computers. My dad, as a primary school teacher, needed to know how to operate those computers, and eventually teach them to kids.

The first thing he showed us when he got the computer, on the living room TV, was a game called “Panic” or “Panique” where you controlled a missile, protecting a town from flying saucers that flew across the screen from either side, faster and faster as the game went on. I still haven't been able to locate this game again.

A couple of years later, the TO7/70 was replaced by a TO9, with a floppy disk, and my dad used that computer to write an educational software about top-down additions, as part of a training program run by the teachers schools (“Écoles Normales” renamed to “IUFM“ in 1990).

After months of nagging, and some spring cleaning, he found the listings of his educational software, which I've liberated, with his permission. I'm currently still working out how to generate floppy disks that are usable directly in emulators. But here's an early screenshot.


Later on, my dad got an IBM PC compatible, an Olivetti PC/1, on which I'd play a clone of Asteroids for hours, but that's another story. The TO9 got passed down to me, and after spending a full summer doing planning for my hot-dog and chips van business (I was 10 or 11, and I had weird hobbies already), and entering every game from the “102 Programmes pour...” series of books, the TO9 got put to the side at Christmas, replaced by a Sega Master System, using that same handy SCART connector on the Thomson monitor.

But how does this concern you. Well, I've worked with RetroManCave on a Minitel episode not too long ago, and he agreed to do a history of the Thomson micro-computers. I did a fair bit of the research and fact-checking, as well as some needed repairs to the (prototype!) hardware I managed to find for the occasion. The result is this first look at the history of Thomson.



Finally, if you fancy diving into the Thomson computers, there will be an episode coming shortly about the MO5E hardware, and some games worth running on it, on the same YouTube channel.

I'm currently working on bringing the “TeoTO8D emulator to Flathub, for Linux users. When that's ready, grab some games from the DCMOTO archival site, and have some fun!

I'll also be posting some nitty gritty details about Thomson repairs on my Micro Repairs Twitter feed for the more technically enclined among you.

Alt text

The above layout has N = 7, yielding 42 LEDs.

Apart from the symmetry being visually pleasing compared to the normal row & column Charlieplexing layouts, it's relatively easy to spot errors in the schematic.

Avoiding lookup tables

The major advantage of twistyplexing is the ability to avoid lookup tables and replace them with some relatively straight forward arithmetic.

row = led_number / (N - 1)
column = led_number % (N - 1)
anode = (row + column + 1) % N

Of course the cathode still has to be controlled, but its pin id already defined by the row variable above.

Thanks

The Twistyplexing concept was created by Tom Yu, and defined in this blog post.

June 19, 2018

Most of the past few weeks have been focused on improving the V3D driver’s dEQP/VK-GL-CTS conformance rates. The piglit infrastructure for the conformance tests is strange, and there doesn’t seem to be a way in tree to do an approximation of a proper conformance run, so I’ve had to glue together the deqp and khr_gles profiles, with some overrides for generating the khr_gles test list to be run (since we don’t seem to have a way to run a conformance mustpass list). There were lots of deqp failures when I started, due to not having run that suite before, but I’m down to a <.5% failure rate on simulation now.

On the testing front, thanks to Maxime we now have a vc4 testlist in the i-g-t suite. Hopefully the Raspberry Pi foundation can start using that to get some coverage of the vc4 driver when they update their kernel – as is, it seems my PRs mostly languish until Dom has time to hand-test them, which is absurd.

Boris has been working on supporting plane X/Y offsets (panning) with the T tiling format. He has it working on every other tile row, and I’m concerned that the HW may just be broken, since T scanout didn’t see a whole lot of testing as far as I know.

I wrote a patchset to core DRM to let drivers control the ordering of calling into bridges, which is important for DSI where you want to have the bridge prepare be before video packets are scanned out, but after the module is ready to send DSI transactions. This should let more DSI panels work with the Raspberry Pi, but the patch needs a rework after review.

June 15, 2018
After 8 years of doing graphics driver development in i915, i965, and misc for Intel, I have decided to take on other things. I will remain at Intel, and be working on… FreeBSD development. It’s a pretty vague gig at the moment, so I’d be happy to hear of areas in FreeBSD that you think […]
As I mentioned two weeks ago, I’ve transitioned into a new role at Intel. The team is very new and so a lot of my part right now is helping out in organizing the game plan. Last week I attended BSDCan 2018 as well as the FreeBSD dev summit. That trip in addition to feedback […]
June 13, 2018

This post does not describe a configuration system. If that's all you care about, read this post here and go be angry at someone else. Anyway, with that out of the way let's get started.

For a long time, libinput has supported model quirks (first added in Apr 2015). These model quirks are bitflags applied to some devices so we can enable special behaviours in the code. Model flags can be very specific ("this is a Lenovo x230 Touchpad") or generic ("This is a trackball") and it just depends on what the specific behaviour is that we need. The x230 touchpad for example has a custom pointer acceleration but trackballs are marked so they get some config options mice don't have/need.

In addition to model tags we also have custom attributes. These are free-form and provide information that we cannot get from the kernel. These too can be specific ("this model needs a pressure threshold of N") or generic ("bluetooth keyboards are an external keyboards").

Overall, it's a good system. Most users never have to care that we even have this. The whole point is that any device-specific quirks need to be merged only once for each model, then everyone with the same device gets to benefit on the next update.

Originally quirks were hardcoded but this required rebuilding libinput for any changes. So we moved this to utilise the udev hwdb. For the trivial work of fetching udev properties we got a lot of flexibility in how we can match against devices. For example, an entry may look like this:


libinput:name:*AlpsPS/2 ALPS GlidePoint:dmi:*svnDellInc.:pnLatitudeE6220:*
LIBINPUT_ATTR_PRESSURE_RANGE=100:90
The above uses a name match and the dmi modalias match to apply a property for the touchpad on the Dell Latitude E6330. The exact match format is defined by a bunch of udev rules that ship as part of libinput.

Using the udev hwdb maked the quirk storage a plaintext file that can be updated independently of libinput, including local overrides for testing things before merging them upstream. Having said that, it's definitely not public API and can change even between stable branch updates as properties are renamed or rescoped to fit the behaviour more accurately. For example, a model-specific tag may be renamed to a behaviour-specific tag as we find more devices affected by the same issue.

The main issue with the quirks now is that we keep accumulating more and more of them and I'm starting to hit limits with the udev hwdb match behaviour. The hwdb is great for single matches but not so great for cascading matches where one match may overwrite another match. The hwdb match system is largely implementation-defined so it's not always predictable which match rule wins out in the end.

Second, debugging the udev hwdb is not at all trivial. It's a bit like git - once you're used to it it's just fine but until then the air turns yellow with all the swearing being excreted by the unsuspecting user.

So long story short, libinput 1.12 will replace the hwdb model quirks database with a set of .ini files. The model quirks will be installed in /usr/share/libinput/ or whatever prefix your distribution prefers instead. It's a bunch of files with fairly simplistic instructions, each [section] has a set of MatchFoo=Bar directives and the ModelFoo=bar or AttrFoo=bar tags. See this file for an example. If all MatchFoo directives apply to a device, the Model and Attr tags are applied. Matching works in inter- and intra-file sequential order so the last section in a file overrides the first section of that file and the highest-sorting file overrides the lowest-sorting file. Otherwise the tags are accumulated, so if two files match on the same device with different tags, both tags are applied. So far, so unexciting.

Sometimes it's necessary to install a temporary local quirk until upstream libinput is updated or the distribution updates its package. For this, the /etc/libinput/local-overrides.quirks file is read in as well (if it exists). Note though that the config files are considered internal API, so any local overrides may stop working on the next libinput update. Should've upstreamed that quirk, eh?

These files give us the same functionality as the hwdb - we can drop in extra files without recompiling. They're more human-readable than a hwdb match and it's a lot easier to add extra match conditions to it. And we can extend the file format at will. But the biggest advantage is that we can quite easily write debugging tools to figure out why something works or doesn't work. The libinput list-quirks tool shows what tags apply to a device and using the --verbose flag shows you all the files and sections and how they apply or don't apply to your device.

As usual, the libinput documentation has details.

June 12, 2018
Fingerprint readers are more and more common on Windows laptops, and hardware makers would really like to not have to make a separate SKU without the fingerprint reader just for Linux, if that fingerprint reader is unsupported there.

The original makers of those fingerprint readers just need to send patches to the libfprint Bugzilla, I hear you say, and the problem's solved!

But it turns out it's pretty difficult to write those new drivers, and those patches, without an insight on how the internals of libfprint work, and what all those internal, undocumented APIs mean.

Most of the drivers already present in libfprint are the results of reverse engineering, which means that none of them is a best-of-breed example of a driver, with all the unknown values and magic numbers.

Let's try to fix all this!

Step 1: fail faster

When you're writing a driver, the last thing you want is to have to wait for your compilation to fail. We ported libfprint to meson and shaved off a significant amount of time from a successful compilation. We also reduced the number of places where new drivers need to be declared to be added to the compilation.

Step 2: make it clearer

While doxygen is nice because it requires very little scaffolding to generate API documentation, the output is also not up to the level we expect. We ported the documentation to gtk-doc, which has a more readable page layout, easy support for cross-references, and gives us more control over how introductory paragraphs are laid out. See the before and after for yourselves.

Step 3: fail elsewhere

You created your patch locally, tested it out, and it's ready to go! But you don't know about git-bz, and you ended up attaching a patch file which you uploaded. Except you uploaded the wrong patch. Or the patch with the right name but from the wrong directory. Or you know git-bz but used the wrong commit id and uploaded another unrelated patch. This is all a bit too much.

We migrated our bugs and repository for both libfprint and fprintd to Freedesktop.org's GitLab. Merge Requests are automatically built, discussions are easier to follow!

Step 4: show it to me

Now that we have spiffy documentation, unified bug, patches and sources under one roof, we need to modernise our website. We used GitLab's CI/CD integration to generate our website from sources, including creating API documentation and listing supported devices from git master, to reduce the need to search the sources for that information.

Step 5: simplify

This process has started, but isn't finished yet. We're slowly splitting up the internal API between "internal internal" (what the library uses to work internally) and "internal for drivers" which we eventually hope to document to make writing drivers easier. This is partially done, but will need a lot more work in the coming months.

TL;DR: We migrated libfprint to meson, gtk-doc, GitLab, added a CI, and are writing docs for driver authors, everything's on the website!
June 07, 2018

So you have probably noticed by now that we started offering some 3rd party software in the latest Fedora Workstation namely Google Chrome, Steam, NVidia driver and PyCharm. This has come about due to a long discussion in the Fedora community on how we position Fedora Workstation and how we can improve our user experience. The principles we base of this policy you can read up on in this policy document. To sum it up though the idea is that while the Fedora operating system you install will continue as it has been for the last decade to be based on only free software (with an exception for firmware) you will be able to more easily find and install the plethora of applications out there through our software store application, GNOME Software. We also expect that as the world of Linux software moves towards containers in general and Flatpaks specifically we will have an increasing number of these 3rd party applications available in Fedora.

So the question I know some of you will have is, what do one actually have to do in order to get a 3rd party application listed in Fedora Workstation? Well wonder no longer as we put up a few documents now outlining the steps you will need to take. Compared to traditional linux packaging the major difference in the requirements around improved metadata for your application, so we are covering that aspect in special detail. These documents cover both RPMS and Flatpaks.

First of all you can get a general overview from our 3rd Party guidelines document. This document also explains how you submit a request to the Fedora Workstation Working group for your application to be added.

Then if you want to dig into the details of what metadata you need to create for your application there is the in-depth metadata tutorial here and finally once you are ready to set up your repository there is a tutorial explaining how you ensure your repository is able to provide the metadata you created above.

We expect to add more and more applications to Fedora Workstation over time here, and I would especially recommend that you look into Flatpaking your 3rd party application as it will decouple your application from the host operating system and thus decrease the workload on you maintaining your application for use in Fedora Workstation (and elsewhere).

This time we talk trackpoints. Or pointing sticks, or whatever else you want to call that thing between the GHB keys. If you don't have one and you've never seen one, prepare to be amazed. [1]

Trackpoints are tiny joysticks that react to pressure [2], convert that pressure into relative x/y events and pass that on to whoever is interested in it. The harder you push, the higher the deltas. This is where the simple and obvious stops and it gets difficult. But then again, if it was that easy I wouldn't write this post, you wouldn't have anything to read, so somehow everyone wins. Whoop-dee-doo.

All the data and measurements below refer to my trackpoint, a Lenovo T440s. It may not apply to any other trackpoints, including those on on different laptop models or even on the same laptop model with different firmware versions. I've written the below with a lot of cringing and handwringing. I want to write data that is irrefutable, but the universe is against me and what the universe wants, the universe gets. Approximately every second sentence below has a footnote of "actual results may vary". Feel free to re-create the data on your device though.

Measuring trackpoint range is highly subjective, so you'll have to trust me when I describe how specific speeds/pressure ranges feel. There are three ranges of pressure on my trackpoint (sort-of):

  • Pressure range one: When resting the finger on the trackpoint I don't really need to apply noticable pressure to make the trackpoint send events. Just moving the finger on the trackpoint makes it send events, albeit sporadically.
  • Pressure range two: Going beyond range one requires applying real pressure and feels to me like we're getting into RSI territory. Not a problem for short periods, but definitely not something I'd want all the time. It's the pressure I'd use to cross the screen.
  • Pressure range three: I have to push hard. I definitely wouldn't want to do this during everyday interaction and it just feels wrong anyway. This pressure range is for testing maximum deltas, not one you would want to use otherwise.
The first/second range are easier delineated than the second/third range because going from almost no pressure to some real pressure is easy. Going from some pressure to too much pressure is more blurry, there is some overlap between second and third range. Either way, keep these ranges in mind though as I'll be using them in the explanations below.

Ok, so with the physical conditions explained, let's look at what we have to worry about in software:

  • It is impossible to provide a constant input to a trackpoint if you're a puny human. Without a robotic setup you just cannot apply constant pressure so any measurements have some error. You also get to enjoy a feedback loop - pressure influences pointer motion but that pointer motion influences how much pressure you inadvertently apply. This makes any comparison filled with errors. I don't know if I'm applying the same pressure on the two devices I'm testing, I don't know if a user I'm asking to test something uses constant/the same/the right pressure.
  • Not all trackpoints are created equal. Some trackpoints (mostly in Lenovos), have configurable sensibility - 256 levels of it. [3] So one trackpoint measured does not equal another trackpoint unless you keep track of the firmware-set sensibility. Those trackpoints also have other toggles. More importantly and AFAIK, this type of trackpoint also has a built-in acceleration curve. [4] Other trackpoints (ALPS) just have a fixed sensibility, I have no idea whether those have a built-in acceleration curve or merely have a linear-ish pressure->delta mappings.

    Due to some design choices we did years ago, systemd increases the sensitivity on some devices (the POINTINGSTICK_SENSITIVITY property). So even on a vanilla install, you can't actually rely on the trackpoint being set to the manufacturer default. This was in an attempt to make trackpoints behave more consistently, systemd had the hwdb and it seemed like the right place to put device-specific quirks. In hindsight, it was the wrong design choice.
  • Deltas are ... unreliable. At high sensitivity and high pressures you might get a sequence of [7, 7, 14, 8, 3, 7]. At lower pressure you get the deltas at seemingly random intervals. This could be because it's hard to keep exact constant pressure, it could be a hardware issue.
  • evdev has been the default driver for almost a decade and before that it was the mouse driver for a long time. So the kernel will "Divide 4 since trackpoint's speed is too fast" [sic] for some trackpoints. Or by 8. Or not at all. In other words, the kernel adjusts for what the default user space is and userspace is based on what the kernel provides. On the newest ALPS trackpoints the kernel has stopped doing any in-kernel scaling (good!) but that means that the deltas are out by a factor of 8 now.
  • Trackpoints don't always have the same pressure ranges for x/y. AFAICT the y range is usually a bit less than the x range on many or most trackpoints. A bit weird because the finger position would suggest that strong vertical pressure is easier to apply than sideways pressure.
  • (Some? All?) Trackpoints have built-in calibration procedures to find and set their own center-point. Without that you'll get the trackpoint eventually being ever so slightly off center over time, causing a mouse pointer that just wanders off the screen, possibly into the woods, without the obligatory red cape and basket full of whatever grandma eats when she's sick.

    So the calibration is required but can be triggered accidentally by the user: If you push with the same pressure into the same direction for 2-5 seconds (depending on $THINGS) you trigger the calibration procedure and the current position becomes the new center point. When you release, the cursor wanders off for a few seconds until the calibration sets things straight again. If you ever see the cursor buzz off in a fixed direction or walking backwards for a centimetre or two you've triggered that calibration. The only way to avoid this is to make sure the pointer acceleration mechanism allows you to reach any target within 2 seconds and/or never forces you to apply constant pressure for more than 2 seconds. Now there's a challenge...

Ok. If you've been paying attention instead of hoping for a TLDR that's more elusive than Godot, we're now aware of the various drawbacks of collecting data from a trackpoint. Let's go and look at data. Sensitivity is set to the kernel default of 128 in sysfs, the default reporting rate is 100Hz. All observations are YMMV and whatnot, especially the latter.

Trackpoint deltas are in integers but the dynamic range of delta values is tiny. You mostly get 1 or 2 and it requires quite a fair bit of pressure to get up to 5 or more. At low pressure you get deltas of 1, but less frequently. Visualised, the relationship between deltas and the interval between deltas is like this:

At low pressure, we get deltas of 1 but high intervals. As the pressure increases, the interval between events shrinks until at some point the interval between events matches the reporting rate (100Hz/10ms). Increasing the pressure further now increases the deltas while the intervals remain at the reporting rate. For example, here's an event sequence at low pressure:

E: 63796.187226 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +20ms
E: 63796.227912 0002 0001 0001 # EV_REL / REL_Y 1
E: 63796.227912 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +40ms
E: 63796.277549 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.277549 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +50ms
E: 63796.436793 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.436793 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +159ms
E: 63796.546114 0002 0001 0001 # EV_REL / REL_Y 1
E: 63796.546114 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +110ms
E: 63796.606765 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.606765 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +60ms
E: 63796.786510 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.786510 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +180ms
E: 63796.885943 0002 0001 0001 # EV_REL / REL_Y 1
E: 63796.885943 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +99ms
E: 63796.956703 0002 0000 -001 # EV_REL / REL_X -1
E: 63796.956703 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +71ms
This was me pressing lightly but with perceived constant pressure and the time stamps between events go from 20m to 180ms. Remember what I said above about unreliable deltas? Yeah, that.

Here's an event sequence from a trackpoint at a pressure that triggers almost constant reporting:


E: 72743.926045 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.926045 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.926045 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 72743.939414 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.939414 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.939414 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +13ms
E: 72743.949159 0002 0000 -002 # EV_REL / REL_X -2
E: 72743.949159 0002 0001 -002 # EV_REL / REL_Y -2
E: 72743.949159 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 72743.956340 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.956340 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.956340 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +7ms
E: 72743.978602 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.978602 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.978602 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +22ms
E: 72743.989368 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.989368 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.989368 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +11ms
E: 72743.999342 0002 0000 -001 # EV_REL / REL_X -1
E: 72743.999342 0002 0001 -001 # EV_REL / REL_Y -1
E: 72743.999342 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 72744.009154 0002 0000 -001 # EV_REL / REL_X -1
E: 72744.009154 0002 0001 -001 # EV_REL / REL_Y -1
E: 72744.009154 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +10ms
E: 72744.018965 0002 0000 -002 # EV_REL / REL_X -2
E: 72744.018965 0002 0001 -003 # EV_REL / REL_Y -3
E: 72744.018965 0000 0000 0000 # ------------ SYN_REPORT (0) ---------- +9ms
Note how there is an events in there with 22ms? Maintaining constant pressure is hard. You can re-create the above recordings by running evemu-record.

Pressing hard I get deltas up to maybe 5. That's staying within the second pressure range outlined above, I can force higher deltas but what's the point. So the dynamic range for deltas alone is terrible - we have a grand total of 5 values across the comfortable range.

Changing the sensitivity setting higher than the default will send higher deltas, including deltas greater than 1 before reaching the report rate. Setting it to lower than the default (does anyone do that?) sends smaller deltas. But doing so means changing the hardware properties, similar to how some gaming mice can switch dpi on the fly.

I leave you with a fun thought exercise in correlation vs. causation: your trackpoint uses PS/2, your touchpad probably uses PS/2. Your trackpoint has a reporting rate of 100Hz but when you touch the touchpad half the bandwidth is used by the touchpad. So your trackpoint sends half the events when you have the palm resting on the touchpad. From my observations, the deltas don't double in size. In other words, your trackpoint just slows down to roughly half the speed. I can reduce the reporting rate to approximately a third by putting two or more fingers onto the touchpad. Trackpoints haven't changed that much over the years but touchpads have. So the takeway is: 10 years ago touchpads were smaller and trackpoints were faster. Simply because you could use them without touching the touchpad. Mind blown (if true, measuring these things is hard...)

Well, that was fun, wasn't it. I'm glad you stayed that long, because I did and it'd feel lonely otherwise. In the next post I'll outline the pointer acceleration curves for trackpoints and what we're going to to about that. Besides despairing, that is.

[1] I doubt you will be, but it always pays to be prepared.
[2] In this post I'm using "pressure" here as side-ways pressure, not downwards pressure. Some trackpoints can handle downwards pressure and modify the acceleration based on it (or expect userland to do so).
[3] Not that this number is always correct, the Lenovo CompactKeyboard USB with Trackpoint has a default sensibility of 5 - any laptop trackpoint would be unusable at that low value (their default is 128).
[4] I honestly don't know this for sure but ages ago I found a hw spec document that actually detailed the process. Search for ""TrackPoint System Version 4.0 Engineering Specification", page 43 "2.6.2 DIGITAL TRANSFER FUNCTION"
June 06, 2018

Thanks to Daniel Stone's efforts, libinput is now on gitlab. For a longer explanation on the move from the old freedesktop infrastructure (cgit, bugzilla, etc.) to the gitlab instance hosted by freedesktop.org, see this email.

All open bugs have been migrated from bugzilla to gitlab too, the documentation has been updated acccordingly, and we're ready to go. The new base URL for libinput in gitlab is: https://gitlab.freedesktop.org/libinput/.

June 04, 2018

The quiescent current of the APA102 2020 LEDs is about 0.7-1.0mA. Which is rather high for some usecases. But recently some new option have been made available.

The APA102 1515 and the APA104 1515, both of which come in a 1.5x1.5mm package. Additionally the APA104 1515 IC has a quiescent current of 0.3mA, which is good step in the right direction.

APA102 1515 datasheet

APA104 1515 datasheet

Unfortunately only the APA104 datasheet specifies quiescent current, so the quiescent current of the APA102 15155 is still an unknown.

May 30, 2018

Early this month, I started on the upstreaming the prerequisite patches for Dave Stevenson’s MMAL V4L2 codec driver. The Kodi developers have tested that their pipeline works on the KMS backend using that driver, so once we get this upstream it should mean full media decode acceleration support for Kodi and gstreamer-based applications. All the prep patches were quickly accepted by gregkh, and now the MMAL camera driver will automatically probe at boot time when enabled.

The next week, I took a trip out to Cambridge to meet up with GNOME developers to work on performance issues on their software stack with lower-spec devices.

My primary task I adopted during the hackfest was building up support for capturing DRM events into sysprof. For anyone who hasn’t used it, sysprof is a beautiful tool for doing systemwide sampling profiling. Its maintainer, Christian Hergert, has been building up some timeline infrastructure in it, so you can see CPU utilization over time and look at samples within a chosen time range. By including some useful perf tracepoints in our captures too, we can see when vblank is and when GPU jobs were running relative to it.

vc4 timeline

That’s the first cut of throwing the events from a Raspberry Pi running fullscreen glxgears into the UI – you’ve got the GPU’s binner on top, then the GPU’s renderer, and the vblank event at the bottom. These aren’t great yet, but for a couple of days of hacking it’s a good start. We should be able to do AMDGPU, V3D, and etnaviv timelines easily as well, but i915 unfortunately hides the necessary tracepoints under an off-by-default Kconfig option.

There exist other tools that capture GPU timelines already, like gpuvis. However, gpuvis seems to be composed of a bunch of shell scripts and commands on a wiki, and isn’t packaged on debian. Incorporating support for DRM events into sysprof means that anyone updating the package will get this code, complete with nice authentication for getting the privileges necessary to start profiling. While I was there, Mathias worked on having GTK provide tracepoints from some of its notable codepaths, and Jonas worked on having gnome-shell get tracepoints as well. Hopefully, all of these tracepoints glued together can give us some visualization of what went wrong across the system when we miss a vblank.

Ultimately, though, I’m skeptical of GNOME 3 ever being usable on a Raspberry Pi. The clutter-based gnome-shell painting is too slow (60% of a CPU burned in the shell just trying to present a single 60fps glxgears), and there doesn’t seem to be a plan for improving it other than “maybe we’ll delete clutter some day?” Also, the javascipt extension system being in the compositor thread means that you drop application frames when something else (network statechanges, notifications, etc) happens in the system. This was a bad software architecture choice, and digging out of that hole now would take a long time. (I’m agnostic on whether it was wrong to move those into the same process as the compositor, but same thread was definitely wrong). I’ll keep working on the debugging tools to try to enable anyone to work on these problems, though.

After the hackfest, I stopped by the Raspberry Pi offices to talk multimedia again. Dave Stevenson’s going to take on upstreaming the v4l2 codec driver, having seen how easy the camera upstreaming to staging was. I respun the SAND display patches to get them upstreamed, so we can start talking about SAND with V4L2 soon. I also did some review of the core DRM writeback connector patches, so that hopefully we can push Boris’s VC4 support and be able to use that from things like HWC2 to do scene flattening.

While at the hotel, I tried out Stefan Schake’s vc4 fencing patches, wrote a couple of little fixes, and pushed the lot to Mesa.

For V3D, I wrote a patch series for supporting a bunch of GLES3 bitfield conversion operations on HW with no explicit instructions for them. Unfortunately, as the first one with hardware needing them, there hasn’t been any review yet. I also implemented glSampleMask/glSampleCoverage, and fixed some NaN bugs. Finally, I pushed the patches enabling the Mesa driver by default now that the kernel side has been accepted upstream!

After that busy couple of weeks, I took a week off at Portland’s regional burn, and I’m back to work now.

May 29, 2018

I am very happy to see that Benjamin Tissoires work to enable the Dell Canvas and Totem has started to land in the upstream kernel. This work is the result of a collaboration between ourselves at Red Hat and Dell to bring this exciting device to Linux users.

Dell Canvas 27

Dell Canvas

The Dell Canvas and totem is essentially a graphics tablet with a stylus and also a turnable knob that can be placed onto the graphics tablet. Dell feature some videos on their site showcasing the Dell Canvas being used in ares such as drawing, video editing and CAD.

So for Linux applications supporting graphic drawing tablets already the canvas should work once this lands, but where we hope to see applications developers step up is adding support in their application for the totem. I have been pondering how we could help make that happen as we would be happy to donate a Dell Canvas to help kickstart application support, I am just unsure about the best way to go ahead.
I was considering offering one as a prize for the first application to add support for the totem, but that seems to be a chicken and egg problem by definition. If anyone got any suggestions for how to get one of these into the hands of the developer most interested and able to take advantage of it?

libinput 1.11 is just around the corner and one of the new features added are the libinput-record and libinput-replay tools. These are largely independent of libinput itself (libinput-replay is a python script) and replace the evemu-record and evemu-replay tools. The functionality is roughly the same with a few handy new features. Note that this is a debugging tool, if you're "just" a user, you may never have to use either tool. But for any bug report expect me to ask for a libinput-record output, same as I currently ask everyone for an evemu recording.

So what does libinput-record do? Simple - it opens an fd to a kernel device node and reads events from it. These events are converted to YAML and printed to stdout (or the provided output file). The output is a combination of machine-readable information and human-readable comments. Included in the output are the various capabilities of the device but also some limited system information like the kernel version and the dmi modalias. The YAML file can be passed to libinput-replay, allowing me to re-create the event device on my test machines and hopefully reproduce the bug. That's about it. evemu did exactly the same thing and it has done wonders for how efficiently we could reproduce and fix bugs.

Alas, evemu isn't perfect. It's becoming 8 years old now and its API is a bit crufty. Originally two separate tools generated two separate files (machine-readable only), two different tools for creating the device and playing events. Over the years it got more useful. Now we only have one tool each to record or replay events and the file includes human-readable comments. But we're hitting limits, its file format is very inflexible and the API is the same. So we'd have to add a new file format and the required parsing, break the API, deal with angry users, etc. Not worth it.

Thus libinput-record is the replacement for evemu. The main features that libinput-record adds are a more standardised file format that can be expanded and parsed easily, the ability to record and replay multiple devices at once and the interleaving of evdev events with libinput events to check what's happening. And it's more secure by default, all alphanumeric keys are (by default) printed as KEY_A so there's no risk of a password leaking into a file attached to Bugzilla. evemu required python bindings, for libinput-record's output format we don't need those since you can just access YAML as array in Python. And finally - it's part of libinput which means it's going to be easier to install (because distributions won't just ignore libinput) and it's going to be more up-to-date (because if you update libinput, you get the new libinput-record).

It's new code so it will take a while to iron out any leftover bugs but after that it'll be the glorious future ;)

May 20, 2018

The All Systems Go! 2018 Call for Participation is Now Open!

The Call for Participation (CFP) for All Systems Go! 2018 is now open. We’d like to invite you to submit your proposals for consideration to the CFP submission site.

ASG image

The CFP will close on July 30th. Notification of acceptance and non-acceptance will go out within 7 days of the closing of the CFP.

All topics relevant to foundational open-source Linux technologies are welcome. In particular, however, we are looking for proposals including, but not limited to, the following topics:

  • Low-level container executors and infrastructure
  • IoT and embedded OS infrastructure
  • BPF and eBPF filtering
  • OS, container, IoT image delivery and updating
  • Building Linux devices and applications
  • Low-level desktop technologies
  • Networking
  • System and service management
  • Tracing and performance measuring
  • IPC and RPC systems
  • Security and Sandboxing

While our focus is definitely more on the user-space side of things, talks about kernel projects are welcome, as long as they have a clear and direct relevance for user-space.

For more information please visit our conference website!

May 17, 2018

A textured cube with a reflection

The libre Midgard driver, Panfrost has reached a number of new milestones, culminating in the above screenshot, demonstrating:

  • Textures! The bug preventing textures was finally cracked. I implemented support for textures in the compiler and through Gallium; I integrated some texture swizzling from limare’s source code; et voila, textures.

  • Multiple shaders! Previously, the Midgard, NIR-based compiler and the Gallium driver were separate; the compiler read GLSL from the disk, writing back compiled binaries to disk, and the driver would read these binaries. While this path is still used in the standalone test driver, the Gallium driver is now integrated with the compiler directly, enabling “online compilation”. Among numerous other benefits, multiple shaders can now be used in the same program.

  • The stencil test in the Gallium driver. The scissor test could have been used instead, but the stencil test can generalise to non-rectangular surfaces. Additionally, the mechanics of the stencil test on Midgard hardware is better understood than the scissor test for the time being.

  • Blending (partial support), again through Mesa. Currently, only fixed-function blending is supported; implementing “blend shaders” is low-priority due to their rarity, complexity, and performance.

  • My love for My Little Pony. Screenshot is CC BY-SA, a derivative of PonyToast’s photograph of the Element of Generousity’s voice actress.

Warning: the following is ruthlessly technical and contains My Little Pony references. Proceed at your own risk.

Textures are by far the most significant addition. Although work decoding their command stream and shader instructions had commenced months ago, we hadn’t managed to replay a texture until May 3, let alone implement support in the driver. The lack of functional textures was the only remaining showstopper. We had poured in long hours debugging it, narrowing down the problem to the command stream, but nothing budged. No permutation of the texture descriptor or the sampler descriptor changed the situation. Yet, everyone was sure that once we figured it out, it would have been something silly in hindsight.

It was.

OpenGL’s textures in the command stream are controlled by the texture and sampler descriptors, corresponding to Vulkan’s textures and samplers respectively. They were the obvious place to look for bugs.

They were not the culprit.

Where did the blame lie, then?

The shader descriptor.

Midgard’s shader descriptor, a block in the command stream which configures a shader, has a number of fields: the address of the compiled shader binary, the number of registers used, the number of attributes/varyings/uniforms, and so forth. I thought that was it. A shader descriptor from a replay with textures looked like this (reformatted for clarity):

struct mali_shader_meta shader_meta = {
    .shader = (shader_memory + 1920) | 5,
    // XXX shader zero tripped
    .attribute_count = 0,
    .varying_count = 1,
    .uniform_registers = (0 << 20) | 0x20e00,
};

That is, the shader code is at shader_memory + 1920; as a fragment shader, it uses no attributes but it does receive a single varying; it does not use any uniforms. All accounted for, right?

What’s that comment about, “XXX shader zero tripped”, then?

There are frequently fields in the command stream that we observe to always be zero, for various reasons. Sometimes they are there for padding and alignment. Sometimes they correspond to a feature that none of our tests had used yet. In any event, it is distracting for a command stream log to be filled with lines like:

.zero0 = 0,
.zero1 = 0,
.zero2 = 0,
.zero3 = 0,

In an effort to keep everything tidy, fields that were observed to always be zero are not printed. Instead, the tracer just makes sure that the unprinted fields (which default to zero by the C compiler) are, in fact, equal to zero. If they are not, a warning is printed, stating that a “zero is tripped”, as if the field were a trap. When the reader of the log sees this line, they know that the replay is incomplete, as they are missing a value somewhere; a field was wrongly marked as “always zero”. It was a perfect system.

At least, it would have been a perfect system, if I had noticed the warning.

I was hyper-focused on the new texture and sampler descriptors, on the memory allocations for the texture memory itself, on the shader binaries – I was hyper-focused on textures that I only skimmed the rest of the log for anomalies.

If I had – when I finally did, on that fateful Thursday – I would have realised that the zero was tripped. I would have committed a change like:

-               if (t->zero1)
-                       panwrap_msg("XXX shader zero tripped\n");
+               //if (t->zero1)
+               //      panwrap_msg("XXX shader zero tripped\n");
 
+               panwrap_prop("zero1 = %" PRId16, t->zero1);
        panwrap_prop("attribute_count = %" PRId16, t->attribute_count);
        panwrap_prop("varying_count = %" PRId16, t->varying_count);

I would have then discovered that “zero1” was mysteriously equal to 65537 for my sample with a texture. And I would have noticed that suddenly, texture replay worked!

Everything fell into place from then. Notice that 65537 in decimal is equal to 0x10001 in hex. With some spacing included for clarity, that’s 0x 0001 0001. Alternatively, instead of a single 32-bit word, it can be interpreted as two 16-bit integers: two ones in succession. What two things do we have one of in the command stream?

Textures and samplers!

Easy to enough to handle in the command stream:

    mali_ptr shader;
-       u32 zero1;
+
+       u16 texture_count; 
+       u16 sampler_count;
 
    /* Counted as number of address slots (i.e. half-precision vec4's) */
    u16 attribute_count;

After that, it was just a matter of moving code from the replay into the real driver, writing functions to translate Gallium commands into Midgard structures, implementing a routine in the compiler to translate NIR instructions to Midgard instructions, and a lot of debugging. A week later, all the core code for textures was in place… almost.

The other big problem posed by textures is their internal format. In some graphics systems, textures are linear, the most intuitive format; that is, a pixel is accessed in the texture by texture[y*stride + x]. However, for reasons of cache locality, this format is a disaster for a GPU; instead, textures are stored “tiled” or “swizzled”. This article offers a good overview of tiled texture layouts.

Texture tiling is great and powerful for hardware. It is less great and powerful for driver writers. Decoding the swizzling algorithm would have been a mammoth task, orthogonal to the command stream and shader work for textures. 3D drivers are complex – textures have three major components that are each orthogonal to each other.

It would have been hopeless… if libv had not already decoded the layout when writing limare! The heavy lifting was done, all released under the MIT license. In an afternoon’s work, I extracted the relevant code from limare, cleaned it up a bit, and made it up about 20% faster (Abstract rounding). The resulting algorithm is still somewhat opaque, but it works! In a single thread on my armv7 RK3288 laptop, about 355, RGBA32 1080p textures can be swizzled in 10 seconds flat.

I then integrated the swizzling code with the Gallium driver, et voilà, vraimente– non, non, ce fois, c’est vrai – je ne mens pas! – euh, bon, je doivais finir d’autres tâches avant pouvoir démontrer test-cube-textured, mais…. voilà! (Sorry for speaking Prancy.)

Textures?

Textures!


On the Bifrost side, Lyude Paul has continued her work writing an assembler. The parser, a surprisingly complex task given the nuances of the ISA, is now working reliably. Code emission is in nascent stages, and her assembler is now making progress on instruction encoding. The first instructions have almost been emitted. May many more instructions follow.

However, an assembler for Bifrost is no good without a free driver to use it with; accordingly, Connor Abbott has continued his work investigating the Bifrost command stream. It continues to demonstrate considerable similarities to Midgard; luckily, much of the driver code will be shareable between the architectures. Like the assembler, this work is still in early stages, implemented in a personal branch, but early results look promising.

And a little birdy told me that there might be T880 support in the pipes.

May 14, 2018

This post is part of a four part series: Part 1, Part 2, Part 3, Part 4.

In the first three parts, I covered the X server and synaptics pointer acceleration curves and how libinput compares to the X server pointer acceleration curve. In this post, I will compare libinput to the synaptics acceleration curve.

Comparison of synaptics and libinput

libinput has multiple different pointer acceleration curves, depending on the device. In this post, I will only consider the one used for touchpads. So let's compare the synaptics curve with the libinput curve at the default configurations:

But this one doesn't tell the whole story, because the touchpad accel for libinput actually changes once we get faster. So here are the same two curves, but this time with the range up to 1000mm/s. These two graphs show that libinput is both very different and similar. Both curves have an acceleration factor less than 1 for the majority of speeds, they both decelerate the touchpad more than accelerating it. synaptics has two factors it sticks to and a short curve, libinput has a short deceleration curve and its plateau is the same or lower than synaptics for the most part. Once the threshold is hit at around 250 mm/s, libinput's acceleration keeps increasing until it hits a maximum much later.

So, anything under ~20mm/s, libinput should be the same as synaptics (ignoring the <7mm/s deceleration). For anything less than 250mm/s, libinput should be slower. I say "should be" because that is not actually the case, synaptics is slower so I suspect the server scaling slows down synaptics even further. Hacking around in the libinput code, I found that moving libinput's baseline to 0.2 matches the synaptics cursor's speed. However, AFAIK that scaling depends on the screen size, so your mileage may vary.

Comparing configuration settings

Let's overlay the libinput speed toggles. In Part 2 we've seen the synaptics toggles and they're open-ended, so it's a bit hard to pick a specific set to go with to compare. I'll be using the same combined configuration options from the diagram there.

And we need the diagram from 0-1000mm/s as well. There isn't much I can talk about here in direct comparison, the curves are quite different and the synaptics curves vary greatly with the configuration options (even though the shape remains the same).

Analysis

It's fairly obvious that the acceleration profiles are very different once depart from the default settings. Most notable, only libinput's slowest speed setting matches the 0.2 speed that is the synaptics default setting. In other words, if your touchpad is too fast compared to synaptics, it may not be possible to slow it down sufficiently. Likewise, even at the fastest speed, the baseline is well below the synaptics baseline for e.g. 0.6 [1], so if your touchpad is too slow, you may not be able to speed it up sufficiently (at least for low speeds). That problem won't exist for the maximum acceleration factor, the main question here is simply whether they are too high. Answer: I don't know.

So the base speed of the touchpad in libinput needs a higher range, that's IMO a definitive bug that I need to work on. The rest... I don't know. Let's see how we go.

[1] A configuration I found suggested in some forum when googling for MinSpeed, so let's assume there's at least one person out there using it.

This post is part of a four part series: Part 1, Part 2, Part 3, Part 4.

In Part 1 and Part 2 I showed the X server acceleration code as used by the evdev and synaptics drivers. In this part, I'll show how it compares against libinput.

Comparison to libinput

libinput has multiple different pointer acceleration curves, depending on the device. In this post, I will only consider the default one used for mice. A discussion of the touchpad acceleration curve comes later. So, back to the graph of the simple profile. Let's overlay this with the libinput pointer acceleration curve:

Turns out the pointer acceleration curve, mostly modeled after the xserver behaviour roughly matches the xserver behaviour. Note that libinput normalizes to 1000dpi (provided MOUSE_DPI is set correctly) and thus the curves only match this way for 1000dpi devices.

libinput's deceleration is slightly different but I doubt it is really noticeable. The plateau of no acceleration is virtually identical, i.e. at slow speeds libinput moves like the xserver's pointer does. Likewise for speeds above ~33mm/s, libinput and the server accelerate by the same amount. The actual curve is slightly different. It is a linear curve (I doubt that's noticeable) and it doesn't have that jump in it. The xserver acceleration maxes out at roughly 20mm/s. The only difference in acceleration is for the range of 10mm/s to 33mm/s.

30mm/s is still a relatively slow movement (just move your mouse by 30mm within a second, it doesn't feel fast). This means that for all but slow movements, the current server and libinput acceleration provides but a flat acceleration at whatever the maximum acceleration is set to.

Comparison of configuration options

The biggest difference libinput has to the X server is that it exposes a single knob of normalised continuous configuration (-1.0 == slowest, 1.0 == fastest). It relies on settings like MOUSE_DPI to provide enough information to map a device into that normalised range.

Let's look at the libinput speed settings and their effect on the acceleration profile (libinput 1.10.x).

libinput's speed setting is a combination of changing thresholds and accel at the same time. The faster you go, the sooner acceleration applies and the higher the maximum acceleration is. For very slow speeds, libinput provides deceleration. Noticeable here though is that the baseline speed is the same until we get to speed settings of less than -0.5 (where we have an effectively flat profile anyway). So up to the (speed-dependent) threshold, the mouse speed is always the same.

Let's look at the comparison of libinput's speed setting to the accel setting in the simple profile:

Clearly obvious: libinput's range is a lot smaller than what the accel setting allows (that one is effectively unbounded). This obviously applies to the deceleration as well: I'm not posting the threshold comparison, as Part 1 shows it does not effect the maximum acceleration factor anyway.

Analysis

So, where does this leave us? I honestly don't know. The curves are different but the only paper I could find on comparing acceleration curves is Casiez and Roussel' 2011 UIST paper. It provides a comparison of the X server acceleration with the Windows and OS X acceleration curves [1]. It shows quite a difference between the three systems but the authors note that no specific acceleration curve is definitely superior. However, the most interesting bit here is that both the Windows and the OS X curve seem to be constant acceleration (with very minor changes) rather than changing the curve shape.

Either way, there is one possible solution for libinput to implement: to change the base plateau with the speed. Otherwise libinput's acceleration curve is well defined for the configurable range. And a maximum acceleration factor of 3.5 is plenty for a properly configured mouse (generally anything above 3 is tricky to control). AFAICT, the main issues with pointer acceleration come from mice that either don't have MOUSE_DPI set or trackpoints which are, unfortunately, a completely different problem.

I'll probably also give the windows/OS X approaches a try (i.e. same curve, different constant deceleration) and see how that goes. If it works well, that may be a a solution because it's easier to scale into a large range. Otherwise, *shrug*, someone will have to come with a better solution.

[1] I've never been able to reproduce the same gain (== factor) but at least the shape and x axis seems to match.

This post is part of a four part series: Part 1, Part 2, Part 3, Part 4.

In Part 1 I showed the X server acceleration code as used by the evdev driver (which leaves all acceleration up to the server). In this part, I'll show the acceleration code as used by the synaptics touchpad driver. This driver installs a device-specific acceleration profile but beyond that the acceleration is... difficult. The profile itself is not necessarily indicative of the real movement, the coordinates are scaled between device-relative, device-absolute, screen-relative, etc. so often that it's hard to keep track of what the real delta is. So let's look at the profile only.

Diagram generation

Diagrams were generated by gnuplot, parsing .dat files generated by the ptrveloc tool in the git repo. Helper scripts to regenerate all data are in the repo too. Default values unless otherwise specified:

  • MinSpeed: 0.4
  • MaxSpeed: 0.7
  • AccelFactor: 0.04
  • dpi: 1000 (used for converting units to mm)
All diagrams are limited to 100 mm/s and a factor of 5 so they are directly comparable. From earlier testing I found movements above over 300 mm/s are rare, once you hit 500 mm/s the acceleration doesn't really matter that much anymore, you're going to hit the screen edge anyway.

The choice of 1000 dpi is a difficult one. It makes the diagrams directly comparable to those in Part 1but touchpads have a great variety in their resolution. For example, an ALPS DualPoint touchpad may have resolutions of 25-32 units/mm. A Lenovo T440s has a resolution of 42 units/mm over PS/2 but 20 units/mm over the newer SMBus/RMI4 protocol. This is the same touchpad. Overall it doesn't actually matter that much though, see below.

The acceleration profile

This driver has a custom acceleration profile, configured by the MinSpeed, MaxSpeed and AccelFactor options. The former two put a cap on the factor but MinSpeed also adjusts (overwrites) ConstantDeceleration. The AccelFactor defaults to a device-specific size based on the device diagonal.

Let's look at the defaults of 0.4/0.7 for min/max and 0.04 (default on my touchpad) for the accel factor:

The simple profile from part 1 is shown in this graph for comparison. The synaptics profile is printed as two curves, one for the profile output value and one for the real value used on the delta. Unlike the simple profile you cannot configure ConstantDeceleration separately, it depends on MinSpeed. Thus the real acceleration factor is always less than 1, so the synaptics driver doesn't accelerate as such, it controls how much the deltas are decelerated.

The actual acceleration curve is just a plain old linear interpolation between the min and max acceleration values. If you look at the curves closer you'll find that there is no acceleration up to 20mm/s and flat acceleration from 25mm/s onwards. Only in this small speed range does the driver adjust its acceleration based on input speed. Whether this is in intentional or just happened, I don't know.

The accel factor depends on the touchpad x/y axis. On my T440s using PS/2, the factor defaults to 0.04. If I get it to use SMBus/RMI4 instead of PS/2, that same device has an accel factor of 0.09. An ALPS touchpad may have a factor of 0.13, based on the min/max values for the x/y axes. These devices all have different resolutions though, so here are the comparison graphs taking the axis range and the resolution into account:

The diagonal affects the accel factor, so these three touchpads (two curves are the same physical touchpad, just using a different bus) get slightly different acceleration curves. They're more similar than I expected though and for the rest of this post we can get away we just looking at the 0.04 default value from my touchpad.

Note that due to how synaptics is handled in the server, this isn't the whole story, there is more coordinate scaling etc. happening after the acceleration code. The synaptics acceleration profile also does not acccommodate for uneven x/y resolutions, this is handled in the server afterwards. On touchpads with uneven resolutions the velocity thus depends on the vector, moving along the x axis provides differently sized deltas than moving along the y axis. However, anything applied later isn't speed dependent but merely a constant scale, so these curves are still a good representation of what happens.

The effect of configurations

What does the acceleration factor do? It changes when acceleration kicks in and how steep the acceleration is.

And how do the min/max values play together? Let's adjust MinSpeed but leave MaxSpeed at 0.7.

MinSpeed lifts the baseline (i.e. the minimum acceleration factor), somewhat expected from a parameter named this way. But it looks again like we have a bug here. When MinSpeed and MaxSpeed are close together, our acceleration actually decreases once we're past the threshold. So counterintuitively, a higher MinSpeed can result in a slower cursor once you move faster.

MaxSpeed is not too different here:

The same bug is present, if the MaxSpeed is smaller or close to MinSpeed, our acceleration actually goes down. A quick check of the sources didn't indicate anything enforcing MinSpeed < MaxSpeed either. But otherwise MaxSpeed lifts the maximum acceleration factor.

These graphs look at the options in separation, in reality users would likely configure both MinSpeed and MaxSpeed at the same time. Since both have an immediate effect on pointer movement, trial and error configuration is simple and straightforward. Below is a graph of all three adjusted semi-randomly:

No suprises in there, the baseline (and thus slowest speed) changes, the maximum acceleration changes and how long it takes to get there changes. The curves vary quite a bit though, so without knowing the configuration options, it's impossible to predict how a specific touchpad behaves.

Epilogue

The graphs above show the effect of configuration options in the synaptics driver. I purposely didn't put any specific analysis in and/or compare it to libinput. That comes in a future post.

This post is part of a four part series: Part 1, Part 2, Part 3, Part 4.

Over the last few days, I once again tried to tackle pointer acceleration. After all, I still get plenty of complaints about how terrible libinput is and how the world was so much better without it. So I once more tried to understand the X server's pointer acceleration code. Note: the evdev driver doesn't do any acceleration, it's all handled in the server. Synaptics will come in part two, so this here focuses mostly on pointer acceleration for mice/trackpoints.

After a few failed attempts of live analysis [1], I finally succeeded extracting the pointer acceleration code into something that could be visualised. That helped me a great deal in going back and understanding the various bits and how they fit together.

The approach was: copy the ptrveloc.(c|h) files into a new project, set up a meson.build file, #define all the bits that are assumed to be there and voila, here's your library. Now we can build basic analysis tools provided we initialise all the structs the pointer accel code needs correctly. I think I succeeded. The git repo is here if anyone wants to check the data. All scripts to generate the data files are in the repository.

A note on language: the terms "speed" and "velocity" are subtly different but for this post the difference doesn't matter. The code uses "velocity" but "speed" is more natural to talk about, so just assume equivalence.

The X server acceleration code

There are 15 configuration options for pointer acceleration (ConstantDeceleration, AdaptiveDeceleration, AccelerationProfile, ExpectedRate, VelocityTrackerCount, Softening, VelocityScale, VelocityReset, VelocityInitialRange, VelocityRelDiff, VelocityAbsDiff, AccelerationProfileAveraging, AccelerationNumerator, AccelerationDenominator, AccelerationThreshold). Basically, every number is exposed as configurable knob. The acceleration code is a product of a time when we were handing out configuration options like participation medals at a children's footy tournament. Assume that for the rest of this blog post, every behavioural description ends with "unless specific configuration combinations apply". In reality, I think only four options are commonly used: AccelerationNumerator, AccelerationDenominator, AccelerationThreshold, and ConstantDeceleration. These four have immediate effects on the pointer movement and thus it's easy to do trial-and-error configuration.

The server has different acceleration profiles (called the 'pointer transfer function' in the literature). Each profile is a function that converts speed into a factor. That factor is then combined with other things like constant deceleration, but eventually our output delta forms as:


deltaout(x, y) = deltain(x, y) * factor * deceleration
The output delta is passed back to the server and the pointer saunters over by few pixels, happily bumping into any screen edge on the way.

The input for the acceleration profile is a speed in mickeys, a threshold (in mickeys) and a max accel factor (unitless). Mickeys are a bit tricky. This means the acceleration is device-specific, the deltas for a mouse at 1000 dpi are 20% larger than the deltas for a mouse at 800 dpi (assuming same physical distance and speed). The "Resolution" option in evdev can work around this, but by default this means that the acceleration factor is (on average) higher for high-resolution mice for the same physical movement. It also means that that xorg.conf snippet you found on stackoverflow probably does not do the same on your device.

The second problem with mickeys is that they require a frequency to map to a physical speed. If a device sends events every N ms, delta/N gives us a speed in units/ms. But we need mickeys for the profiles. Devices generally have a fixed reporting rate and the speed of each mickey is the same as (units/ms * reporting rate). This rate defaults to 10 in the server (the VelocityScaling default value) and thus matches a device reporting at 100Hz (a discussion of this comes later). All graphs below were generated with this default value.

Back to the profile function and how it works: The threshold(usually) defines the mimimum speed at which acceleration kicks in. The max accel factor (usually) limits the acceleration. So the simplest algorithm is


if (velocity < threshold)
return base_velocity;
factor = calculate_factor(velocity);
if (factor > max_accel)
return max_accel;
return factor;
In reality, things are somewhere between this simple and "whoops, what have we done".

Diagram generation

Diagrams were generated by gnuplot, parsing .dat files generated by the ptrveloc tool in the git repo. Helper scripts to regenerate all data are in the repo too. Default values unless otherwise specified:

  • threshold: 4
  • accel: 2
  • dpi: 1000 (used for converting units to mm)
  • constant deceleration: 1
  • profile: classic
All diagrams are limited to 100 mm/s and a factor of 5 so they are directly comparable. From earlier testing I found movements above over 300 mm/s are rare, once you hit 500 mm/s the acceleration doesn't really matter that much anymore, you're going to hit the screen edge anyway.

Acceleration profiles

The server provides a number of profiles, but I have seen very little evidence that people use anything but the default "Classic" profile. Synaptics installs a device-specific profile. Below is a comparison of the profiles just so you get a rough idea what each profile does. For this post, I'll focus on the default Classic only.

First thing to point out here that if you want to have your pointer travel to Mars, the linear profile is what you should choose. This profile is unusable without further configuration to bring the incline to a more sensible level. Only the simple and limited profiles have a maximum factor, all others increase acceleration indefinitely. The faster you go, the more it accelerates the movement. I find them completely unusable at anything but low speeds.

The classic profile transparently maps to the simple profile, so the curves are identical.

Anyway, as said above, profile changes are rare. The one we care about is the default profile: the classic profile which transparently maps to the simple profile (SimpleSmoothProfile() in the source).

Looks like there's a bug in the profile formula. At the threshold value it jumps from 1 to 1.5 before the curve kicks in. This code was added in ~2008, apparently no-one noticed this in a decade.

The profile has deceleration (accel factor < 1 and thus decreasing the deltas) at slow speeds. This provides extra precision at slow speeds without compromising pointer speed at higher physical speeds.

The effect of config options

Ok, now let's look at the classic profile and the configuration options. What happens when we change the threshold?

First thing that sticks out: one of these is not like the others. The classic profile changes to the polynomial profile at thresholds less than 1.0. *shrug* I think there's some historical reason, I didn't chase it up.

Otherwise, the threshold not only defines when acceleration starts kicking in but it also affects steepness of the curve. So higher threshold also means acceleration kicks in slower as the speed increases. It has no effect on the low-speed deceleration.

What happens when we change the max accel factor? This factor is actually set via the AccelerationNumerator and AccelerationDenominator options (because floats used to be more expensive than buying a house). At runtime, the Xlib function of your choice is XChangePointerControl(). That's what all the traditional config tools use (xset, your desktop environment pre-libinput, etc.).

First thing that sticks out: one is not like the others. When max acceleration is 0, the factor is always zero for speeds exceeding the threshold. No user impact though, the server discards factors of 0.0 and leaves the input delta as-is.

Otherwise it's relatively unexciting, it changes the maximum acceleration without changing the incline of the function. And it has no effect on deceleration. Because the curves aren't linear ones, they don't overlap 100% but meh, whatever. The higher values are cut off in this view, but they just look like a larger version of the visible 2 and 4 curves.

Next config option: ConstantDeceleration. This one is handled outside of the profile but at the code is easy-enough to follow, it's a basic multiplier applied together with the factor. (I cheated and just did this in gnuplot directly)

Easy to see what happens with the curve here, it simply stretches vertically without changing the properties of the curve itself. If the deceleration is greater than 1, we get constant acceleration instead.

All this means with the default profile, we have 3 ways of adjusting it. What we can't directly change is the incline, i.e. the actual process of acceleration remains the same.

Velocity calculation

As mentioned above, the profile applies to a velocity so obviously we need to calculate that first. This is done by storing each delta and looking at their direction and individual velocity. As long as the direction remains roughly the same and the velocity between deltas doesn't change too much, the velocity is averaged across multiple deltas - up to 16 in the default config. Of course you can change whether this averaging applies, the max time deltas or velocity deltas, etc. I'm honestly not sure anyone ever used any of these options intentionally or with any real success.

Velocity scaling was explained above (units/ms * reporting rate). The default value for the reporting rate is 10, equivalent to 100Hz. Of the 155 frequencies currently defined in 70-mouse.hwdb, only one is 100 Hz. The most common one here is 125Hz, followed by 1000Hz followed by 166Hz and 142Hz. Now, the vast majority of devices don't have an entry in the hwdb file, so this data does not represent a significant sample set. But for modern mice, the default velocity scale of 10 is probably off between 25% and a factor 10. While this doesn't mean much for the local example (users generally just move the numbers around until they're happy enough) it means that the actual values are largely meaningless for anyone but those with the same hardware.

Of note: the synaptics driver automatically sets VelocityScale to 80Hz. This is correct for the vast majority of touchpads.

Epilogue

The graphs above show the X server's pointer acceleration for mice, trackballs and other devices and the effects of the configuration toggles. I purposely did not put any specific analysis in and/or comparison to libinput. That will come in a future post.

[1] I still have a branch somewhere where the server prints yaml to the log file which can then be extracted by shell scripts, passed on to python for processing and ++++ out of cheese error. redo from start ++++
May 09, 2018

Intro slide

Downloads

If you're curious about the slides, you can download the PDF or the OTP.

Thanks

This post has been a part of work undertaken by my employer Collabora.

I would like to thank the wonderful organizers of OpenTechSummit, specifically @hpdang and @mariobehling for hosting a great event.

May 07, 2018

The Vulkan specification includes a number of optional features that drivers may or may not support, as described in chapter 30.1 Features. Application developers can query the driver for supported features via vkGetPhysicalDeviceFeatures() and then activate the subset they need in the pEnabledFeatures field of the VkDeviceCreateInfo structure passed at device creation time.

In the last few weeks I have been spending some time, together with my colleague Chema, adding support for one of these features in Anvil, the Intel Vulkan driver in Mesa, called shaderInt16, which we landed in Mesa master last week. This is an optional feature available since Vulkan 1.0. From the spec:

shaderInt16 specifies whether 16-bit integers (signed and unsigned) are supported in shader code. If this feature is not enabled, 16-bit integer types must not be used in shader code. This also specifies whether shader modules can declare the Int16 capability.

It is probably relevant to highlight that this Vulkan capability also requires the SPIR-V Int16 capability, which basically means that the driver’s SPIR-V compiler backend can actually digest SPIR-V shaders that declare and use 16-bit integers, and which is really the core of the functionality exposed by the Vulkan feature.

Ideally, shaderInt16 would increase the overall throughput of integer operations in shaders, leading to better performance when you don’t need a full 32-bit range. It may also provide better overall register usage since you need less register space to store your integer data during shader execution. It is important to remark, however, that not all hardware platforms (Intel or otherwise) may have native support for all possible types of 16-bit operations, and thus, some of them might still need to run in 32-bit (which requires injecting type conversion instructions in the shader code). For Intel platforms, this is the case for operations associated with integer division.

From the point of view of the driver, this is the first time that we generally exercise lower bit-size data types in the driver compiler backend, so if you find any bugs in the implementation, please file bug reports in bugzilla!

Speaking of shaderInt16, I think it is worth mentioning its interactions with other Vulkan functionality that we implemented in the past: the Vulkan 1.0 VK_KHR_16bit_storage extension (which has been promoted to core in Vulkan 1.1). From the spec:

The VK_KHR_16bit_storage extension allows use of 16-bit types in shader input and output interfaces, and push constant blocks. This extension introduces several new optional features which map to SPIR-V capabilities and allow access to 16-bit data in Block-decorated objects in the Uniform and the StorageBuffer storage classes, and objects in the PushConstant storage class. This extension allows 16-bit variables to be declared and used as user-defined shader inputs and outputs but does not change location assignment and component assignment rules.

While the shaderInt16 capability provides the means to operate with 16-bit integers inside a shader, the VK_KHR_16bit_storage extension provides developers with the means to also feed shaders with 16-bit integer (and also floating point) input data, such as Uniform/Storage Buffer Objects or Push Constants, from the applications side, plus, it also gives the opportunity for linked shader stages in a graphics pipeline to consume 16-bit shader inputs and produce 16-bit shader outputs.

VK_KHR_16bit_storage and shaderInt16 should be seen as two parts of a whole, each one addressing one part of a larger problem: VK_KHR_16bit_storage can help reduce memory bandwith for Uniform and Storage Buffer data accessed from the shaders and / or optimize Push Constant space, of which there are only a few bytes available, making it a precious shader resource, but without shaderInt16, shaders that are fed 16-bit input data are still required to convert this data to 32-bit internally for operation (and then back again to 16-bit for output if needed). Likewise, shaders that use shaderInt16 without VK_KHR_16bit_storage can only operate with 16-bit data that is generated inside the shader, which largely limits its usage. Both together, however, give you the complete functionality.

Conclusions

We are very happy to continue expanding the feature set supported in Anvil and we look forward to seeing application developers making good use of shaderInt16 in Vulkan to improve shader performance. As noted above, this is the first time that we fully enable the shader compiler backend to do general purpose operations on lower bit-size data types and there might be things that we can still improve or optimize. If you hit any issues with the implementation, please contact us and / or file bug reports so we can continue to improve the implementation.

Last week I fixed the last change from review feedback and landed the V3D (vc5/6 kernel DRM driver) to drm-misc-next. This means, unless something exceptional happens, it’ll be in kernel 4.18. I also sent out a patch series renaming the Mesa driver to V3D as well.

For VC4, I landed Stefan Schake’s syncobj patches in the kernel. Once that makes its way to drm-next, we can merge the Mesa side (EGL_ANDROID_native_fence_sync extension) and I believe get out-of-the-box Android HWC2 support.

I also merged a couple of old DSI patches of mine that got reviewed, my DPI regression fix, and a fix from Boris for a (valid) warning that was happening about the MADVISE refcounting versus async pageflips.

Boris has been busy working on adding display properties for TV underscan. Many HDMI monitors emulate old TV style scanout where, despite being 1920x1080 pixels, they read from a smaller area of their input and scale up. HDMI has infoframes to tell the TV whether the input is formatted for that old style scanout or not (as a desktop, we would always want them to scan out 1:1), but since the HDMI conformance tests didn’t check it, most old HDMI TVs don’t change behavior on that input and just always underscan (or you have to dig around in menus to get 1:1. The overscan property on the closed source firmware configures the HVS to scale the 1920x1080 desktop down to the region that the TV will scale back up. Display will look bad, but it’s better than losing your menu bars off the edge of the screen. Once we collectively decide what the new KMS property should be, Boris’s patches should let people replicate this workaround in the KMS world.

Boris has also been working on patch series to get VC4 to work with the DSI display in the DT, whether or not the DSI display is actually connected. We need this if we want to avoid having additional closed source firmware code hacking up the DT based on its own probing of the DSI display. His new series seems to have a chance of getting past both DT, panel, and bridge maintainers, but I have honestly lost count of what iteration of probing behavior we’re on for this device (I think it’s around 10 variations, though). It was all working at one point, before DT review requirements wrecked it.

April 30, 2018

Last weekend I attended Ubucon Europe 2018 in the city center of Gijón, Spain, at the Antiguo Instituto. This 3-day conference was full of talks, sometimes 3 or even 4 at the same time! The covered topics went from Ubuntu LoCo teams to Industrial Hardware… there was at least one talk interesting for all attendees! Please check the schedule if you want to know what happened at Ubucon ;-)

Ubucon Europe 2018

I would like to say thank you to all the organization team. They did a great job taking care of both the attendees and the speakers, they organized a very nice traditional espicha that introduced good cider and the Asturian gastronomy to the people coming from any part of the world. The conference was amazing and I am very pleased to have participated in this event. I recommend you to attend it next year… lots of fun and open-source!

Regarding my talk “Introduction to Mesa, an open-source graphics API”, there were 30 people attending to it. It was great to check that so many people were interested on a brief introduction about how Mesa works, how the community collaborate together and how anyone can become contributor to Mesa, even as a non-developer. I hope I can see them contributing to Mesa in the coming future!

You can find the slides of my talk here.

Ubucon Europe 2018 talk

For VC4, I landed Stefan Schake’s patches for CTM support (notably Android color correction) and per-plane alpha (so you can fade planes in and out without rewriting the alpha in all the pixels). He also submitted patches to the kernel and Mesa for syncobjects/fence fd support (needed for Android HWC2, which I gave some simplifying feedback for but are otherwise almost ready to land.

For V3D, I incorporated Daniel Vetter’s feedback to the DRM code and resubmitted it. I’m optimistic for merging to 4.18. I then spent a while trawling the HW bug reports looking for reasons for my remaining set of GPU hangs on 7278, generating a large set of Mesa patches (most of which were assertions for those GPU restrictions I learned about, and which we aren’t triggering yet). While I was doing that, I also went through some CTS and piglit failures, and:

  • fixed gallium’s refcounting of separate stencil buffers
  • fixed reloads of separate-stencil buffers
  • added v4.x MSAA support
  • added centroid varying support
  • fixed a bit of 2101010 unorm/snorm texture support in gallium
  • fixed a few EGL tests from the GLES CTS
April 24, 2018

Been some time now since my last update on what is happening in Fedora Workstation and with current plans to release Fedora Workstation 28 in early May I thought this could be a good time to write something. As usual this is just a small subset of what the team has been doing and I always end up feeling a bit bad for not talking about the avalanche of general fixes and improvements the team adds to each release.

Thunderbolt
Christian Kellner has done a tremendous job keeping everyone informed of his work making sure we have proper Thunderbolt support in Fedora Workstation 28. One important aspect for us of this improved Thunderbolt support is that a lot of docking stations coming out will be requiring it and thus without this work being done you would not be able to use a wide range of docking stations. For a lot of screenshots and more details about how the thunderbolt support is done I recommend reading this article in Christians Blog.

3rd party applications
It has taken us quite some time to get there as getting this feature right both included a lot of internal discussion about policies around it and implementation detail. But starting from Fedora Workstation 28 you will be able to find more 3rd party software listed in GNOME Software if you enable it. The way it will work is that you as part of the initial setup will be asked if you want to have 3rd party software show up in GNOME Software. If you are upgrading you will be asked inside GNOME Software if you want to enable 3rd party software. You can also disable 3rd party software after enabling it from the GNOME Software settings as seen below:

GNOME Software settings

GNOME Software settings

In Fedora Workstation 27 we did have PyCharm available, but we have now added the NVidia driver and Steam to the list for Fedora Workstation 28.

We have also been working with Google to try to get Chrome included here and we are almost there as they merged for instance the needed Appstream metadata some time ago, but the last steps requires some tweaking of how Google generates their package repository (basically adding the appstream metadata to their yum repository) and we don’t have a clear timeline for when that will happen, but as soon as it does the Chrome will also appear in GNOME Software if you have 3rd party software enabled.

As we speak all 3rd party packages are RPMs, but we expect that going forward we will be adding applications packaged as Flatpaks too.

Finally if you want to propose 3rd party applications for inclusion you can find some instructions for how to do it here.

Virtualbox guest
Another major feature that got some attention that we worked on for this release was Hans de Goedes work to ensure Fedora Workstation could run as a virtualbox guest out of the box. We know there are many people who have their first experience with linux running it under Virtualbox on Windows or MacOSX and we wanted to make their first experience as good as possible. Hans worked with the virtualbox team to clean up their kernel drivers and agree on a stable ABI so that they could be merged into the kernel and maintained there from now on.

Firmware updates
The Spectre/Meltdown situation did hammer home to a lot of people the need to have firmware updates easily available and easy to update. We created the Linux Vendor Firmware service for Fedora Workstation users with that in mind and it was great to see the service paying off for many Linux users, not only on Fedora, but also on other distributions who started using the service we provided. I would like to call out to Dell who was a critical partner for the Linux Vendor Firmware effort from day 1 and thus their users got the most benefit from it when Spectre and Meltdown hit. Spectre and Meltdown also helped get a lot of other vendors off the fence or to accelerate their efforts to support LVFS and Richard Hughes and Peter Jones have been working closely with a lot of new vendors during this cycle to get support for their hardware and devices into LVFS. In fact Peter even flew down to the offices one of the biggest laptop vendors recently to help them resolve the last issues before their hardware will start showing up in the firmware service. Thanks to the work of Richard Hughes and Peter Jones you will both see a wider range of brands supported in the Linux Vendor Firmware Service in Fedora Workstation 28, but also a wider range of device classes.

Server side GL Vendor Neutral Dispatch
This is a bit of a technical detail, but Adam Jackson and Lyude Paul has been working hard this cycle on getting what we call Server side GLVND ready for Fedora Workstation 28. Currently we are looking at enabling it either as a zero-day update or short afterwards. so what is Server Side GLVND you say? Well it is basically the last missing piece we need to enable the use of the NVidia binary driver through XWayland. Currently the NVidia driver works with Wayland native OpenGL applications, but if you are trying to run an OpenGL application requiring X we need this to support it. And to be clear once we ship this in Fedora Workstation 28 it will also require a driver update from NVidia to use it, so us shipping it is just step 1 here. We do also expect there to be some need for further tuning once we got all the pieces released to get top notch performance. Of course over time we hope and expect all applications to become Wayland native, but this is a crucial transition technology for many of our users. Of course if you are using Intel or AMD graphics with the Mesa drivers things already work great and this change will not affect you in any way.

Flatpak
Flatpaks basically already work, but we have kept focus this time around on to fleshing out the story in terms of the so called Portals. Portals are essentially how applications are meant to be able to interact with things outside of the container on your desktop. Jan Grulich has put in a lot of great effort making sure we get portal support for Qt and KDE applications, most recently by adding support for the screen capture portal on top of Pipewire. You can ready more about that on Jan Grulichs blog. He is now focusing on getting the printing portal working with Qt.

Wim Taymans has also kept going full steam ahead of PipeWire, which is critical for us to enable applications dealing with cameras and similar on your system to be containerized. More details on that in my previous blog entry talking specifically about Pipewire.

It is also worth noting that we are working with Canonical engineers to ensure Portals also works with Snappy as we want to ensure that developers have a single set of APIs to target in order to allow their applications to be sandboxed on Linux. Alexander Larsson has already reviewed quite a bit of code from the Snappy developers to that effect.

Performance work
Our engineers have spent significant time looking at various performance and memory improvements since the last release. The main credit for the recently talked about ‘memory leak’ goes to Georges Basile Stavracas Neto from Endless, but many from our engineering team helped with diagnosing that and also fixed many other smaller issues along the way. More details about the ‘memory leak’ fix in Georges blog.

We are not done here though and Alberto Ruiz is organizing a big performance focused hackfest in Cambridge, England in May. We hope to bring together many of our core engineers to work with other members of the community to look at possible improvements. The Raspberry Pi will be the main target, but of course most improvements we do to make GNOME Shell run better on a Raspberry Pi also means improvements for normal x86 systems too.

Laptop Battery life
In our efforts to make Linux even better on laptops Hans De Goede spent a lot of time figuring out things we could do to make Fedora Workstation 28 have better battery life. How valuable these changes are will of course depend on your exact hardware, but I expect more or less everyone to have a bit better battery life on Fedora Workstation 28 and for some it could be a lot better battery life. You can read a bit more about these changes in Hans de Goedes blog.