planet.freedesktop.org
http://planet.freedesktop.org
planet.freedesktop.org - http://planet.freedesktop.orgTomeu Vizoso: Rockchip NPU update 2: MobileNetV1 is done
https://blog.tomeuvizoso.net/2024/03/rockchip-npu-update-2-mobilenetv1-is.html
<h3 style="text-align: left;">Progress</h3><p style="text-align: left;">For the last couple of weeks I have kept chipping at a new userspace driver for the NPU in the Rockchip RK3588 SoC.</p><p style="text-align: left;">I am very happy to report that the work has gone really smooth and I reached my first milestone: running the MobileNetV1 model with all convolutions accelerated by the NPU.</p><p style="text-align: left;">And it not only runs flawlessly, but at the same performance level as the blob.</p><p style="text-align: left;">It has been great having access to the register list as disclosed by Rockchip in their TRM, and to the NVDLA and ONNC documentation and source code. This has allowed for the work to proceed at a pace several times faster than with my previous driver for the VeriSilicon NPU, for which a lot of painstaking reverse engineering had to be done.<br /></p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://commons.wikimedia.org/w/index.php?curid=285598" target="_blank"><span style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQiQSHVRGw-EMpuIKA6jxXH-ss_HgutqwgUYXvCg4tPMRq9Js2q7l0NGILTcRlBqDfUOMhKNdzAALj1E8dPN2zxd6aOK59OeO9f5ac0vaWuaEvDEl_EQLu6rd-887qRrMH_7tgG4_oSubzgI2_GCvVD5ck6ukwErppZc1AQ5RawYqzrcB-mec905-jYpI/s320/hen.jpg" width="320" /></span></a></td></tr><tr><td class="tr-caption" style="text-align: center;">by Julien Langlois CC BY-SA 3.0<br /></td></tr></tbody></table><p> <span style="font-family: courier;">tomeu@arm-64:~/mesa$ TEFLON_DEBUG=verbose python3.10 classification.py -i hens.jpg -m mobilenet_v1_1.0_224_quant.tflite -l labels_mobilenet_quant_v1_224.txt -e libteflon.so<br />Loading external delegate from libteflon.so with args: {}<br />Teflon delegate: loaded rknpu driver<br /><br />teflon: compiling graph: 89 tensors 27 operations<br />...<br />teflon: compiled graph, took 413 ms<br />teflon: invoked graph, took 11 ms<br />teflon: invoked graph, took 11 ms<br />teflon: invoked graph, took 11 ms<br />teflon: invoked graph, took 10 ms<br />teflon: invoked graph, took 10 ms<br /><b>0.984314: hen</b><br />0.019608: cock<br />0.000000: toilet tissue<br />0.000000: sea cucumber<br />0.000000: wood rabbit<br />time: 10.776ms<br /></span></p><p style="text-align: left;"><span style="font-family: inherit;">Notice how nothing in the invocation refers to the specific driver that TensorFlow Lite is using, that is completely abstracted by Mesa. Once all these bits are upstream and packaged by distros, one will be able to just download a model in INT8 quantization format and get accelerated inferences going fast irrespective of the hardware.</span></p><p style="text-align: left;"><span style="font-family: inherit;">Thanks to TL Lim of <a href="https://pine64.org/">PINE64</a> for sending me a <a href="https://wiki.pine64.org/wiki/QuartzPro64_Development">QuartzPro64</a> board for me to hack on. <br /></span></p><h3 style="text-align: left;"><span style="font-family: inherit;">Next steps</span></h3><p style="text-align: left;"><span style="font-family: inherit;">I want to go back and get my last work on performance for the VeriSilicon driver upstreamed, so it is packaged in distros sooner rather than later.</span></p><p style="text-align: left;"><span style="font-family: inherit;">After that, I'm a bit torned between working further on the userspace driver and implementing more operations and control flow, or start writing a kernel driver for mainline.<br /></span></p>2024-03-28T07:47:00+00:00Simon Ser: Status update, March 2024
https://emersion.fr/blog/2024/status-update-62/
<p>Hi! It’s this time of the month once again it seems…</p>
<p>We’ve finally released <a href="https://github.com/swaywm/sway/releases/tag/1.9">Sway 1.9</a>! Note that it uses the new wlroots rendering
API, but doesn’t use the scene-graph API: we’ve left that for 1.10. We’ve also
released <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/releases/0.17.2">wlroots 0.17.2</a> with a whole bunch of bug fixes. Special thanks to
Simon Zeni for doing the backporting work!</p>
<p>In other Wayland news, the <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4548">wlroots merge request</a> to atomically
apply changes to multiple outputs has been merged! In addition, <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4567">another
merge request</a> to help compositors allocate the right kind of
buffers during modesets has been merged. These two combined should help
lighting up correctly more multi-output setups on Intel GPUs, which previously
required a workaround (<code>WLR_DRM_NO_MODIFIERS=1</code>). Thanks to Kenny for helping
with that work!</p>
<p>I also got around to writing a <a href="https://github.com/swaywm/sway/pull/8063">Sway patch</a> to gracefully handle
GPU resets. This should be good news for users of a particular GPU vendor which
tends to be a bit trigger happy with resets! Sway will now survive and continue
running instead of being frozen. Note, clients may still glitch, need a nudge
to redraw, or freeze. <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4604">A few</a> <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4606">wlroots</a>
<a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4607">patches</a> were also required to get this to work.</p>
<p>With the help of Jean Thomas, <a href="https://git.sr.ht/~emersion/goguma">Goguma</a> (and <a href="https://git.sr.ht/~emersion/pushgarden">pushgarden</a>) has gained support
for Apple Push Notification service (APNs). This means that Goguma iOS users
can now enjoy instantaneous notifications! This is also important to prove that
it’s possible to design a standard (as an <a href="https://github.com/ircv3/ircv3-specifications/pull/471">IRC extension</a>)
which doesn’t hardcode any proprietary platform (and thus doesn’t force each
IRC server to have one codepath per platform), but still interoperates with
these proprietary platforms (important for usability) and ensures that said
proprietary platforms have minimal access to sensible data (via end-to-end
encryption between the IRC server and the IRC client).</p>
<p>It’s now also possible to share links and files to Goguma. That is, when using
another app (e.g. the gallery, your favorite fediverse client, and many
others) and opening the share menu, Goguma will show up as an option. It will
then ask which conversation to share the content with, and automatically upload
any shared file.</p>
<p>No <abbr title="New Project of the Month">NPotM</abbr> this time around sadly.
To make up for it, I’ve implemented refresh tokens in <a href="https://git.sr.ht/~emersion/sinwon">sinwon</a>, and made most
of the remaining tests pass in <a href="https://git.sr.ht/~emersion/go-mls">go-mls</a>.</p>
<p>See you next month!</p>2024-03-17T22:00:00+00:00Tomeu Vizoso: Rockchip NPU update 1: A walk in the park?
https://blog.tomeuvizoso.net/2024/03/rockchip-npu-update-1-walk-in-park.html
<p>During the past weeks I have paused work on the driver for the Vivante NPU and have started work on a new driver, for Rockchip's own NPU IP, as used in SoCs such as RK3588(S) and RK3568.<br /></p><p>The version of the NPU in the RK3588 claims a performance of 6 TOPS across its 3 cores, though from what I have read, people are having trouble making use of more than one core in parallel, with the closed source driver.<br /></p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGU-XeRJwraDc8PCTHTVdlrt4rM0QeZUuKNFA8WuB4Ogr51PgpWAhll2esCPZatq5SoYxIcyCAbQvahRiSiOCVSysu-dXyJu5gT0C-8hvt3mDe4Wuj_qg98pR_utgzeoyw3C042IDW3ZLgoZux7i877z-D684agsk1_QpYzE2pAO609Mnw1RIFVFE7UMM/s640/pexels-mart-production-8121657.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" height="214" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGU-XeRJwraDc8PCTHTVdlrt4rM0QeZUuKNFA8WuB4Ogr51PgpWAhll2esCPZatq5SoYxIcyCAbQvahRiSiOCVSysu-dXyJu5gT0C-8hvt3mDe4Wuj_qg98pR_utgzeoyw3C042IDW3ZLgoZux7i877z-D684agsk1_QpYzE2pAO609Mnw1RIFVFE7UMM/s320/pexels-mart-production-8121657.jpg" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><i>A nice walk in the park</i><br /></td></tr></tbody></table><p></p><p>Rockchip, as most other vendors of NPU IP, provides a GPLed kernel driver and pushes out their userspace driver in binary form. The kernel driver is pleasantly simple and relatively up-to-date in regards of its use of internal kernel APIs. The userspace stack though is notoriously buggy and difficult to use, with basic features still unimplemented and performance being quite below what the hardware should be able to achieve.</p><p>To be clear, this is on top of the usual problems related to closed-source drivers. I get the impression that Rockchip's NPU team is really understaffed.<br /></p><p>Other people had already looked at reverse-engineering the HW so they could address the limitations and bugs in the closed source driver, and use it in situations not supported by Rockchip. I used information acquired by <a href="https://github.com/phhusson/rknpu-reverse-engineering">Pierre-Hugues Husson</a> and <a href="https://github.com/mtx512/rk3588-npu/">Jasbir Matharu</a> to get started, a big thanks to them!<br /></p><p>After the initial environment was setup (had to forward-port their kernel driver to v6.8), I wrote a simple library that can be loaded in the process with LD_PRELOAD and that, by overriding the ioctl and other syscalls, I was able to dump the buffers that the proprietary userspace driver sends to the hardware.</p><p>I started looking at a buffer that from the debug logs of the proprietary driver contained register writes, and when looking at the register descriptions in the TRM, I saw that it had to be closely based on NVIDIA's NVDLA open-source NPU IP.</p><p>With Rockchip's (terse) description of the registers, NVDLA's documentation and source code for both the hardware and the userspace driver, I have been able to make progress several times faster than I was able to when working on VeriSilicon's driver (for which I had zero documentation).</p><p>Right now I am at the stage at which I am able to correctly execute TensorFLow Lite's Conv2D and DepthwiseConv2D operations with different combinations of input dimensions, weight dimensions, strides and padding. Next is to support multiple output channels.</p><p>I'm currently using Rockchip's kernel, but as soon as I'm able to run object detection models with decent hardware utilization, I plan to start writing a new kernel driver for mainlining.</p><p>Rockchip's kernel driver has gems such as passing addresses in the kernel address space across the UAPI...<br /></p><p>Tests run fast and reliably, even with high concurrency:</p><p><span style="font-family: courier;"><span style="font-size: x-small;">tomeu@arm-64:~/mesa$ TEFLON_TEST_DELEGATE=~/mesa/build/src/gallium/targets/teflon/libteflon.so TEFLON_TEST_DATA=src/gallium/targets/teflon/tests LD_LIBRARY_PATH=/home/tomeu/tflite-vx-delegate/build/_deps/tensorflow-build/ ~/.cargo/bin/gtest-runner run --gtest /home/tomeu/mesa/build/src/gallium/targets/teflon/test_teflon --output /tmp -j8 --tests-per-group 1 --baseline ~/mesa/src/gallium/drivers/rocket/ci/rocket-rk3588-fails.txt --flakes ~/mesa/src/gallium/drivers/rocket/ci/rocket-rk3588-flakes.txt --skips ~/mesa/src/gallium/drivers/rocket/ci/rocket-rk3588-skips.txt <br />Running gtest on 8 threads in 1-test groups<br />Pass: 0, Duration: 0<br />Pass: 139, Skip: 14, Duration: 2, Remaining: 2<br />Pass: 277, Skip: 22, Duration: 4, Remaining: 0<br />Pass: 316, Skip: 24, Duration: 4, Remaining: 0</span></span></p>You can find the source code in <a href="https://gitlab.freedesktop.org/tomeu/mesa/-/tree/rocket?ref_type=heads">this branch</a>.<br /><p></p>2024-03-16T11:46:00+00:00Christian Schaller: PipeWire camera handling is now happening!
https://blogs.gnome.org/uraeus/2024/03/15/pipewire-camera-handling-is-now-happening/
<p>We hit a major milestones this week with the long worked on adoption of PipeWire Camera support finally starting to land!</p>
<p>Not long ago <a href="https://jgrulich.cz/2024/01/30/how-to-use-pipewire-camera-in-firefox/">Firefox was released with experimental PipeWire camera</a> support thanks to the great work by Jan Grulich.</p>
<p>Then this week <a href="https://flathub.org/apps/com.obsproject.Studio">OBS Studio</a> shipped with PipeWire camera support thanks to the great work of Georges Stavracas, who cleaned up the patches and pushed to get them merged based on earlier work by himself, Wim Taymans and Colulmbarius. This means we now have two major applications out there that can use PipeWire for camera handling and thus two applications whose video streams that can be interacted with through patchbay applications like <a href="https://flathub.org/apps/org.pipewire.Helvum">Helvum</a> and <a href="https://flathub.org/apps/org.rncbc.qpwgraph">qpwgraph</a>.<br />
These applications are important and central enough that having them use PipeWire are in itself useful, but they will now also provide two examples of how to do it for application developers looking at how to add PipeWire camera support to their own applications; there is no better documentation than working code.</p>
<p>The PipeWire support is also paired with camera portal support. The use of the portal also means we are getting closer to being able to fully sandbox media applications in Flatpaks which is an important goal in itself. Which reminds me, to test out the new PipeWire support be sure to grab the <a href="https://flathub.org/apps/com.obsproject.Studio">official OBS Studio Flatpak</a> from Flathub.</p>
<p></p><div class="wp-caption alignnone" id="attachment_10893" style="width: 310px;"><img alt="PipeWire camera handling with OBS Studio, Firefox and Helvum." class="size-medium wp-image-10893" height="169" src="https://blogs.gnome.org/uraeus/files/2024/03/pipewire-camera-300x169.png" width="300" /><p class="wp-caption-text" id="caption-attachment-10893">PipeWire camera handling with OBS Studio, Firefox and Helvum.</p></div><br />
Let me explain what is going on in the screenshot above as it is a lot. First of all you see Helvum there on the right showning all the connections made through PipeWire, both the audio and in yellow, the video. So you can see how my Logitech BRIO camera is feeding a camera video stream into both OBS Studio and Firefox. You also see my Magewell HDMI capture card feeding a video stream into OBS Studio and finally gnome-shell providing a screen capture feed that is being fed into OBS Studio. On the left you see on the top Firefox running their WebRTC test app capturing my video then just below that you see the OBS Studio image with the direct camera feed on the top left corner, the screencast of Firefox just below it and finally the ‘no signal’ image is from my HDMI capture card since I had no HDMI device connected to it as I was testing this.<p></p>
<p>For those wondering work is also underway to bring this into Chromium and Google Chrome browsers where Michael Olbrich from Pengutronix has been pushing to get patches written and merged, <a href="https://archive.fosdem.org/2023/schedule/event/om_chromium/attachments/slides/5503/export/events/attachments/om_chromium/slides/5503/FOSDEM2023_Modern_Camera_Handling_in_Chromium.pdf">he did a talk about this work at FOSDEM last year as you can see from these slides</a> with <a href="https://webrtc-review.googlesource.com/c/src/+/264553">this patch</a> being the last step to get this working there too.</p>
<p>The move to PipeWire also prepared us for the new generation of MIPI cameras being rolled out in new laptops and helps push work on supporting those cameras towards <a href="https://libcamera.org/">libcamera</a>, the new library for dealing with the new generation of complex cameras. This of course ties well into the work that <a href="https://fosdem.org/2024/schedule/event/fosdem-2024-3013-a-fully-open-source-stack-for-mipi-cameras/">Hans de Goede and Kate Hsuan has been doing recently, along with Bryan O’Donoghue from Linaro, on providing an open source driver for MIPI cameras</a> and of course the incredible work by Laurent Pinchart and Kieran Bingham from <a href="https://ideasonboard.com/">Ideas on board</a> on libcamera itself. </p>
<p>The PipeWire support is of course fresh and I am sure we will find bugs and corner cases that needs fixing as more people test out the functionality in both Firefox and OBS Studio and there are some interface annoyances we are working to resolve. For instance since PipeWire support both V4L and libcamera as a backend you do atm get double entries in your selection dialogs for most of your cameras. Wireplumber has implemented de-deplucation code which will ensure only the libcamera listing will show for cameras supported by both v4l and libcamera, but is only part of the development version of Wireplumber and thus it will land in <a href="https://fedoraproject.org/workstation/download">Fedora Workstation 40</a>, so until that is out you will have to deal with the duplicate options.<br />
</p><div class="wp-caption alignnone" id="attachment_10896" style="width: 310px;"><img alt="Camera selection dialog" class="size-medium wp-image-10896" height="149" src="https://blogs.gnome.org/uraeus/files/2024/03/camera-selector-300x149.png" width="300" /><p class="wp-caption-text" id="caption-attachment-10896">Camera selection dialog</p></div><br />
We are also trying to figure out how to better deal with infraread cameras that are part of many modern webcams. Obviously you usually do not want to use an IR camera for your video calls, so we need to figure out the best way to identify them and ensure they are clearly marked and not used by default.<p></p>
<p>Another recent good PipeWire new tidbit that became available with the <a href="https://gitlab.freedesktop.org/pipewire/pipewire/-/releases">PipeWire 1.0.4</a> release PipeWire maintainer Wim Taymans also fixed up the FireWire FFADO support. The FFADO support had been in there for some time, but after seeing <a href="https://interfacinglinux.com/2024/02/02/firewire-audio-with-pipewire/">Venn Stone do some thorough tests</a> and find issues we decided it was time to bite the bullet and buy some second hand Firewire hardware for Wim to be able to test and verify himself.</p><div class="wp-caption alignright" id="attachment_10935" style="width: 310px;"><img alt="Focusrite firewire device" class="size-medium wp-image-10935" height="169" src="https://blogs.gnome.org/uraeus/files/2024/03/firewiredevice-300x169.jpeg" width="300" /><p class="wp-caption-text" id="caption-attachment-10935">Focusrite firewire device</p></div>.<br />
Once the Focusrite device I bought landed at Wims house he got to work and cleaned up the FFADO support and make it both work and be performant.<br />
For those unaware FFADO is a way to use Firewire devices without going through ALSA and is popular among pro-audio folks because it gives lower latencies. Firewire is of course a relatively old technology at this point, but the audio equipment is still great and many audio engineers have a lot of these devices, so with this fixed you can plop a Firewire PCI card into your PC and suddenly all those old Firewire devices gets a new lease on life on your Linux system. And you can buy these devices on places like ebay or facebook marketplace for a fraction of their original cost. In some sense this demonstrates the same strength of PipeWire as the libcamera support, in the libcamera case it allows Linux applications a way to smoothly transtion to a new generation of hardware and in this Firewire case it allows Linux applications to keep using older hardware with new applications.<p></p>
<p>So all in all its been a great few weeks for PipeWire and for Linux Audio AND Video, and if you are an application maintainer be sure to look at how you can add PipeWire camera support to your application and of course get that application packaged up as a Flatpak for people using Fedora Workstation and other distributions to consume.</p>2024-03-15T16:30:39+00:00Daniel Vetter: Upstream, Why & How
http://blog.ffwll.ch/2024/03/upstream-why-how.html
<p>In a different epoch, before the pandemic, I’ve done a <a href="https://blog.ffwll.ch/2019/05/upstream-first.html">presentation about
upstream first</a> at the Siemens Linux Community
Event 2018, where I’ve tried to explain the fundamentals of open source using
microeconomics. Unfortunately that talk didn’t work out too well with an
audience that isn’t well-versed in upstream and open source concepts, largely
because it was just too much material crammed into too little time.</p>
<p>Last year I got the opportunity to try again for an Intel-internal event series,
and this time I’ve split the material into two parts. I think that worked a lot
better. For obvious reasons I cannot publish the recordings, but I can publish
the slides.</p>
<p>The <a href="https://blog.ffwll.ch/slides/intel-gdansk-2023.pdf">first part “Upstream, Why?”</a> covers a
few concepts from microeconomcis 101, and then applies them to upstream stream
open source. The key concept is on one hand that open source achieves an
efficient software market in the microeconomic sense by driving margins and
prices to zero. And the only way to make money in such a market is to either
have more-or-less unstable barriers to entry that prevent the efficient market
from forming and destroying all monetary value. Or to sell a complementary
product.</p>
<p>The <a href="https://blog.ffwll.ch/slides/intel-gdansk-2023-part2.pdf">second part”Upstream, How?”</a> then
looks at what this all means for the different stakeholders involved:</p>
<ul>
<li>
<p>Individual engineers, who have skills and create a product with zero economic
value, and might still be stupid enough and try to build a career on that.</p>
</li>
<li>
<p>Upstream communities, often with a formal structure as a foundation, and what
exactly their goals should be to build a thriving upstream open source project
that can actually pay some bills, generate some revenue somewhere else and get
engineers paid. Because without that you’re not going to have much of a
project with a long term future.</p>
</li>
<li>
<p>Engineering organizations, what exactly their incentives and goals should
be, and the fundamental conflicts of interest this causes. Specifically on
this I’ve only seen bad solutions, and ugly solutions, but not yet a really
good one. A relevant pre-pandemic talk of mine on this topic is also
<a href="https://blog.ffwll.ch/2019/12/upstream-too-little-too-late.html">“Upstream Graphics: Too Little, Too
Late”</a></p>
</li>
<li>
<p>And finally the overall business and more importantly, what kind of business
strategy is needed to really thrive with an open source upstream first
approach: You need to clearly understand which software market’s economic
value you want to destroy by driving margins and prices to zero, and which
complemenetary product you’re selling to still earn money.</p>
</li>
</ul>
<p>At least judging by the feedback I’ve received internally taking more time and
going a bit more in-depth on the various concept worked much better than the
keynote presentation I’ve done at Siemens, hence I decided to publish at the
least the slides.</p>2024-03-14T00:00:00+00:00Peter Hutterer: Enforcing a touchscreen mapping in GNOME
http://who-t.blogspot.com/2024/03/enforcing-touchscreen-mapping-in-gnome.html
<p>
Touchscreens are quite prevalent by now but one of the not-so-hidden secrets is that they're actually two devices: the monitor and the actual touch input device. Surprisingly, users want the touch input device to work on the underlying monitor which means your desktop environment needs to somehow figure out which of the monitors belongs to which touch input device. Often these two devices come from two different vendors, so mutter needs to use ... */me holds torch under face* .... HEURISTICS! :scary face:
</p>
<p>
Those heuristics are actually quite simple: same vendor/product ID? same dimensions? is one of the monitors a built-in one? [1] But unfortunately in some cases those heuristics don't produce the correct result. In particular external touchscreens seem to be getting more common again and plugging those into a (non-touch) laptop means you usually get that external screen mapped to the internal display.
</p>
<p>
Luckily mutter does have a configuration to it though it is not exposed in the GNOME Settings (yet). But you, my $age $jedirank, can access this via a commandline interface to at least work around the immediate issue. But first: we need to know the monitor details and you need to know about gsettings relocatable schemas.
</p>
<p>
Finding the right monitor information is relatively trivial: look at <b>$HOME/.config/monitors.xml</b> and get your monitor's vendor, product and serial from there. e.g. in my case this is:
</p><pre> <monitors version="2">
<configuration>
<logicalmonitor>
<x>0</x>
<y>0</y>
<scale>1</scale>
<monitor>
<monitorspec>
<connector>DP-2</connector>
<vendor>DEL</vendor> <--- this one
<product>DELL S2722QC</product> <--- this one
<serial>59PKLD3</serial> <--- and this one
</monitorspec>
<mode>
<width>3840</width>
<height>2160</height>
<rate>59.997</rate>
</mode>
</monitor>
</logicalmonitor>
<logicalmonitor>
<x>928</x>
<y>2160</y>
<scale>1</scale>
<primary>yes</primary>
<monitor>
<monitorspec>
<connector>eDP-1</connector>
<vendor>IVO</vendor>
<product>0x057d</product>
<serial>0x00000000</serial>
</monitorspec>
<mode>
<width>1920</width>
<height>1080</height>
<rate>60.010</rate>
</mode>
</monitor>
</logicalmonitor>
</configuration>
</monitors>
</pre>
Well, so we know the monitor details we want. Note there are two monitors listed here, in this case I want to map the touchscreen to the external Dell monitor. Let's move on to gsettings.
<p></p>
<p>
gsettings is of course the configuration storage wrapper GNOME uses (and the CLI tool with the same name). GSettings follow a specific schema, i.e. a description of a schema name and possible keys and values for each key. You can list all those, set them, look up the available values, etc.:
</p><pre><b></b>
$ gsettings list-recursively
... lots of output ...
$ gsettings set org.gnome.desktop.peripherals.touchpad click-method 'areas'
$ gsettings range org.gnome.desktop.peripherals.touchpad click-method
enum
'default'
'none'
'areas'
'fingers'
</pre>
Now, schemas work fine as-is as long as there is only one instance. Where the same schema is used for different devices (like touchscreens) we use a so-called "relocatable schema" and that requires also specifying a path - and this is where it gets tricky. I'm not aware of any functionality to get the specific path for a relocatable schema so often it's down to reading the source. In the case of touchscreens, the path includes the USB vendor and product ID (in lowercase), e.g. in my case the path is:
<pre> /org/gnome/desktop/peripherals/touchscreens/04f3:2d4a/
</pre>
In your case you can get the touchscreen details from lsusb, libinput record, /proc/bus/input/devices, etc. Once you have it,
gsettings takes a <b>schema:path</b> argument like this:
<pre> $ gsettings list-recursively org.gnome.desktop.peripherals.touchscreen:/org/gnome/desktop/peripherals/touchscreens/04f3:2d4a/
org.gnome.desktop.peripherals.touchscreen output ['', '', '']
</pre>
Looks like the touchscreen is bound to no monitor. Let's bind it with the data from above:
<pre>
$ gsettings set org.gnome.desktop.peripherals.touchscreen:/org/gnome/desktop/peripherals/touchscreens/04f3:2d4a/ output "['DEL', 'DELL S2722QC', '59PKLD3']"
</pre>
Note the quotes so your shell doesn't misinterpret things.
<p></p>
<p>
And that's it. Now I have my internal touchscreen mapped to my external monitor which makes no sense at all but shows that you can map a touchscreen to any screen if you want to.
</p>
<p>
<small>[1] Probably the one that most commonly takes effect since it's the vast vast majority of devices</small>
</p>2024-03-12T04:33:44+00:00Mike Blumenkrantz: Post Interfaces
https://www.supergoodcode.com/post-interfaces/
<h1 id="march">March.</h1>
<p>I’ve had a few things I was going to blog about over the past month, but then news sites picked them up and I lost motivation because there’s only so many hours in a day that anyone wants to spend reading things that aren’t specification texts. Yeah, that’s my life now.</p>
<p>Anyway, a lot’s happened, and I’d try to enumerate it all but I’ve forgotten / lost track / don’t care. <code class="language-plaintext highlighter-rouge">git log</code> me if you’re interested. Some highlights:</p>
<ul>
<li>damage stuff is in</li>
<li>RADV supports shader objects so zink can run Tomb Raider (2013) without stuttering</li>
<li>NVK is about to hit GL conformance on all versions</li>
<li>I’m working on too many projects to keep track of everything</li>
</ul>
<p>More on the last one later. Like in a couple months. When I won’t get vanned for talking about it.</p>
<p>No, it’s not Half Life 3 / Portal 3 / L4D3.</p>
<h1 id="interfaces">Interfaces</h1>
<p>Today’s post was inspired by interfaces: they’re the things that make code go brrrrr. Basically Legos, but for adults who never go outside. If you’ve written code, you’ve done it using an interface.</p>
<p>Graphics has interfaces too. OpenGL is an interface. Vulkan is an interface.</p>
<p>Mesa has interfaces. It’s got some neat ones like Gallium which let you write a whole GL driver without knowing anything about GL.</p>
<p>And then it’s got the DRI interfaces. Which, by their mere existence, answer the question “What could possibly be done to make WSI even worse than it already is?”</p>
<p>The DRI interfaces date way back to a time before the blog. A time when now-dinosaurs roamed the earth. A time when Vulkan was but a twinkle in the eye of Mantle, which didn’t even exist. I’m talking <code class="language-plaintext highlighter-rouge">Copyright 1998-1999 Precision Insight, Inc., Cedar Park, Texas.</code> at the top of the file old.</p>
<p>The point of these interfaces was to let external applications access GL functionality. Specifically the xserver. This was before GLAMOR combined GBM and EGL to enable a better way of doing things that didn’t involve brain damage, and it was a necessary evil to enable cross-vendor hardware acceleration using Mesa. Other historical details abound, but this isn’t a textbook. The DRI interfaces did their job and enabled hardware-accelerated display servers for decades.</p>
<p>Now, however, they’ve become cruft. A hassle. A roadblock on the highway to a future where I can run zink on stupid platforms with ease.</p>
<h1 id="problem">Problem</h1>
<p>The first step to admitting there’s a problem is having a problem. I think that’s how the saying goes, anyway. In Mesa, the problem is any time I (or anyone) want to do something related to the DRI frontend, like <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27628">allow NVK to use zink by default</a>, it has to go through DRI. Which means going through the DRI interfaces. Which means untangling a mess of unnecessary function pointers with versioned prototypes meaning they can’t be changed without adding a new version of the same function and adding new codepaths which call the new version if available. And guess how many people in the project truly understand how all the layers fit together?</p>
<p>It’s a mess. And more than a mess, it’s a huge hassle any time a change needs to be made. Not only do the interfaces have to be versioned and changed, someone looking to work on a new or bitrotted platform has to first chase down all the function pointers to see where the hell execution is headed. Even when the function pointers always lead to the same place.</p>
<p>I don’t have any memes today.</p>
<p>This is my declaration of war.</p>
<p>DRI interfaces: you’re officially on notice. I’m <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28138">coming for you</a>.</p>2024-03-12T00:00:00+00:00Tomeu Vizoso: Etnaviv NPU update 17: Faster!
https://blog.tomeuvizoso.net/2024/02/etnaviv-npu-update-17-faster.html
<p>In the last update I explained how compression of zero weights gave our driver such a big performance improvement.</p><p>Since then, I have explored further what could take us closer to the performance of the proprietary driver and saw the opportunity to gather some of the proverbial low-hanging fruit.</p><h4 style="text-align: left;">TL;DR</h4><p style="text-align: left;">Our driver's performance on SSD MobileDet went from 32.7 ms to 24.8 ms, against the proprietary driver's 19.5 ms.</p><p style="text-align: left;">On MobileNetV1, our driver went from 9.9 ms to 6.6 ms, against the proprietary driver's 5.5 ms. Pretty close!<br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO-x0rNGjtToQ2tdZD06wLekaZfisubI0jCp4BSJunHgf9yspA3b86Sz_XvtZh8IT565W2NXBPCnWHCbiimwFhyphenhyphenArSVPwTT0Q1mqMl2pxxjBh6JVEjh9ikXFEEVLxgNbUxGvjaBMCB0uUeB9BszKvyvwxzWZ5Itiq24PKvNUsWr2m-xGbDlwqmvaP68_4/s848/perf_evol_2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="326" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO-x0rNGjtToQ2tdZD06wLekaZfisubI0jCp4BSJunHgf9yspA3b86Sz_XvtZh8IT565W2NXBPCnWHCbiimwFhyphenhyphenArSVPwTT0Q1mqMl2pxxjBh6JVEjh9ikXFEEVLxgNbUxGvjaBMCB0uUeB9BszKvyvwxzWZ5Itiq24PKvNUsWr2m-xGbDlwqmvaP68_4/w640-h326/perf_evol_2.png" width="640" /></a></div><p></p><h4 style="text-align: left;">Enable more convolutions</h4><p>Our driver
was rejecting convolutions with a number of output channels that is not
divisible by the number of convolution cores in the NPU because at the
start of the development the code that lays the weights out in memory
didn't support that. That caused TensorFlow Lite to run the convolutions
in CPU, and some of them were big enough to take a few milliseconds,
several times more than on the NPU.<br /></p><p>When implementing support
for bigger kernels I had to add improvements to the tiling of the
convolutions and that included adding support for these other
convolutions. So by just removing the rejection of these, we got a nice
speed up on SSD MobileDet: from 32.7ms to 27ms!</p><p>That didn't help on MobileNetV1 because that one has all its convolutions with neat numbers of output channels.</p><h4 style="text-align: left;">Caching of the input tensor</h4><p>So far we were only caching the kernels on the on-chip SRAM. I spent some time looking at how the proprietary driver sets the various caching fields and found a way of getting us to cache a portion of the input tensor on the remaining internal SRAM.</p><p>That got us the rest of the performance improvement mentioned above, but I am having trouble with some combination of parameters when the input tensor caching is enabled, so I need to get to the bottom of it before I submit it for review.</p><h4 style="text-align: left;">Next steps</h4><p>At this point I am pretty confident that we can get quite close to the performance of the proprietary driver without much additional work, as a few major performance features remain to be implemented, and I know that I still need to give a pass at tuning some of the previous performance work.</p><p>But after getting the input tensor caching finished and before I move to any other improvements, I think I will invest some time in adding some profiling facilities so I can better direct the efforts and get the best returns.</p>2024-02-23T12:10:00+00:00Mike Blumenkrantz: Woof
https://www.supergoodcode.com/woof/
<h1 id="it-turns-out">It Turns Out</h1>
<p>…that this year is a lot busier than expected. Blog posts will probably come in small clusters here and there rather than with any sort of regular cadence.</p>
<p>But now I’m here. You’re here. Let’s get cozy for a few minutes.</p>
<h1 id="nvk-oclock">NVK O’clock</h1>
<p>I’m sure you’ve seen some news, you’ve been trawling <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27628">the gitlab MRs</a>, you’re on the <code class="language-plaintext highlighter-rouge">#nouveau</code> channels. You’re one of my readers, so we both know you must be an expert.</p>
<p>Zink on NVK is happening.</p>
<p>Those of you who remember the zink XDC talk know that this work has been ongoing for a while, but now I can finally reveal the real life twist that only a small number of need-to-know community members have been keeping under wraps for years: <strong>I still haven’t been to XDC yet.</strong></p>
<p>Let me explain.</p>
<p>I’m sure everyone recalls the point in the presentation where “I” talked about progress made towards Zink on NVK. A lot of people laughed it off; oh sure, you said, that’s just the usual sort of joke we expect. But what if I told you it wasn’t a joke? That all of it was 100% accurate, it just hadn’t happened yet?</p>
<p>I know what you’re thinking now, and you’re absolutely correct. The me that attended XDC was actually time traveling from the future. A future in which Zink on NVK is very much finished. Since then, I’ve been slowly and quietly “backporting” the patches my future self wrote and slipping them into git.</p>
<p>Let’s look at an example.</p>
<h1 id="the-great-gaming-bug-of-24">The Great Gaming Bug Of ‘24</h1>
<p>20 Feb 2024 was a landmark day in my future-journal for a number of reasons, not the least due to the alarming effects of planetary alignment that you’re all no doubt monitoring. For the purposes of the current blog post that I’m now writing, however, it was monumental for a different reason. This was the day that noted zinkologist and current record-holder for Most Tests Fixed With One Line Of Code, Faith Ekstrand (@gfxstrand), would delve into debugging the most serious known issue in zink+nvk:</p>
<p><a href="https://www.supergoodcode.com/assets/nvk-stk.png"><img alt="nvk-stk.png" src="https://www.supergoodcode.com/assets/nvk-stk.png" /></a></p>
<p>Yup, it’s another clusterfuck.</p>
<p>Now let me say that I had the debug session noted down in my journal, but I didn’t add details. If you haven’t been in #nouveau for a live debug session, it’s worth scheduling time around it. Get some popcorn ready. Put on your safety glasses and set up your regulation-size splatterguard, all the usual, and then…</p>
<p>Well, if I had to describe the scene, it’s like watching someone feed a log into a wood chipper. All the potential issues investigated one-by-one and eliminated into the pile of growing sawdust.</p>
<p>Anyway, it turns out that NVK (currently) does not expose a BAR memory type with host-visible and device-local properties, and zink has no handling for persistently mapped buffers in this scenario. I carefully cherry-picked the appropriate patch from my futurelog and <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27707">rammed it through CI</a> late at night when nobody would notice.</p>
<p>As a result, <strong>all GL games now work on NVK</strong>. No hyperbole. They just work.</p>
<p>Stay tuned for future updates backported from a time when I’m not struggling to find spare seconds under the watchful gaze of Big Triangle.</p>2024-02-21T00:00:00+00:00Simon Ser: Status update, February 2024
https://emersion.fr/blog/2024/status-update-61/
<p>Hi! February is FOSDEM month, and as usual I’ve come to Brussels to meet with a
lot of other FOSS developers and exchange ideas. I like to navigate between the
buildings and along the hallways to find nice people to discuss with. This
edition I’ve been involved in the new modern e-mail devroom and I’ve given a
<a href="https://spacepub.space/w/p/7UMbsDaTTt5o1u63kTTd1X?playlistPosition=5&resume=true">talk about IMAP</a> with Damian, a fellow IMAP library maintainer and organizer
of this devroom. The whole weekend was great!</p>
<p>In wlroots news, I’ve worked on multi-connector atomic commits. Right now,
wlroots sequentially configures outputs, one at a time. This is slow and makes
it impossible to properly handle GPU limitations such as bandwidth: if the GPU
cannot drive two outputs with a 4k resolution, we’ll only find out after the
first one has been lit up. As a result we can’t properly implement fallbacks
and this results in black screens on some setups. In particular, on Intel some
users need to set <code>WLR_DRM_NO_MODIFIERS=1</code> to have their multi-output setup
work correctly. The multi-connector atomic commit work is the first step to
resolve these situations and also results in faster modesets. The second step
will be to add fallback logic to use a less bandwidth-intensive scanout buffer
on modeset.</p>
<p>While working on the wlroots DRM backend code, I’ve also taken the opportunity
to cleanup the internals and skip unnecessary modesets when switching between
VTs. <kbd>Ctrl</kbd> <kbd>Alt</kbd> <kbd>1</kbd> should be faster now! I’ve
also tried to resurrect the <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/3457">ext-screencopy-v1</a> protocol, required for
capturing individual windows. I’ve pushed a new version and reworked the
wlroots implementation, hopefully I can find some more time next month to
continue on this front.</p>
<p>Sway 1.9-rc4 has been recently released, my reading of the tea leaves at my
disposal indicates that the final release may be shipped soon. Sway 1.9 will
leverage the new wlroots rendering API, however it does not include the huge
scene-graph rework that Alexander has pushed forward in the last year or so.
Sway 1.10 will be the first release to include this major overhaul and all the
niceties it unlocks. And Sway 1.10 will also <em>finally</em> support input method
popups (used for CJK among other things) thanks to efforts by Access and Tadeo
Kondrak.</p>
<p>The <abbr title="New Project of the Month">NPotM</abbr> is <a href="https://sr.ht/~emersion/sinwon/">sinwon</a>, a simple
OAuth 2 server for small deployments. I’ve long been trying to find a good
solution to delegate authentication to a single service and provide
single-sign-on for my personal servers. I’ve come to like OAuth 2 because it’s
a standard, it’s not tied to another use-case (like IMAP or SMTP is), and it
prevents other services from manipulating user passwords directly. sinwon
stores everything in a SQLite database, and it’s pretty boring: no fancy
cryptography usage for tokens, no fancy cloud-grade features. I like boring.
sinwon has a simple UI to manage users and OAuth clients (sometimes called
“apps”). Still missing are refresh tokens, OAuth scopes, an audit log, personal
access tokens, and more advanced features such as TOTP, device authorization
grants and mTLS. Patches welcome!</p>
<p>I’ve continued my work to make it easier to contribute to the SourceHut
codebase. Setting up PGP keys is now optional to run a SourceHut instance,
and a local S3-compatible server (such as minio) can be used without TLS.
Thorben Günther has added paste.sr.ht to <a href="https://git.sr.ht/~emersion/sr.ht-container-compose">sr.ht-container-compose</a>. I’m also
working on making services use meta.sr.ht’s GraphQL API instead of maintaining
their own copy of the user’s profile, but more needs to be done there.</p>
<p>And now for the random collection of smaller updates… The soju IRC bouncer and
the goguma IRC client for mobile devices now support file uploads: no need to
use an external service anymore to share a screenshot or picture in an IRC
conversation. Conrad Hoffmann and Thomas Müller have added support for multiple
address books to the go-webdav library, as well as creating/deleting address
books and calendars. I’ve modernized the FreeDesktop e-mail server setup with
SPF, DKIM and DMARC. KDE developers have contributed a new layer-shell minor
version to support docking their panel to a corner of the screen.</p>
<p>That’s all for now, see you next month!</p>2024-02-19T22:00:00+00:00Donnie Berkholz: The lazy technologist’s guide to fitness
https://dberkholz.com/2024/02/05/the-lazy-technologists-guide-to-fitness/
<p>In the past 8 months, I’ve lost 60 pounds and went from completely sedentary to well on my way towards becoming fit, while putting in a minimum of effort. On the fitness side, I’ve taken my cardiorespiratory fitness from below average to above average, and I’m visibly stronger (I can do multiple pull-ups!). Again, I’ve aimed to do so with minimal effort to maximize my efficiency.</p>
<p>Here’s what I wrote in my <a href="https://dberkholz.com/2024/01/17/the-lazy-technologists-guide-to-weight-loss/">prior post on weight loss</a>:</p>
<p>I have no desire to be a bodybuilder, but I want to be in great shape now and be as healthy and mobile as possible well into my old age. And a year ago, my blood pressure was already at pre-hypertension levels, despite being at a relatively young age.</p>
<p><a href="https://www.ahajournals.org/doi/10.1161/CIRCULATIONAHA.117.032047">Research shows</a> that 5 factors are key to a long life — extending your life by 12–14 years:</p>
<ul>
<li>Never smoking</li>
<li>BMI of 15.5–24.9</li>
<li>30+ min a day of moderate/vigorous exercise</li>
<li>Moderate alcohol intake (vs none, occasional, or heavy)
<ul>
<li>Unsurprisingly, there is vigorous scientific and philosophical/religious/moral debate about this one, however all studies agree that heavy drinking is bad.</li>
</ul>
</li>
<li>Diet quality in the upper 40% (Alternate Healthy Eating Index)</li>
</ul>
<p>In addition, people who are in good health have a much shorter end-of-life period. This means they extend the healthy portion of their lifespan (the “<a href="https://www.nature.com/articles/s41536-021-00169-5">healthspan</a>”) and compress the worst parts into a shorter period at the very end. Having seen many grandparents go through years of struggle as they grew older, I wanted my own story to have a different ending.</p>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/01/16198ec8-639a-4c64-bba6-34bb4f0a3783.jpeg"><img alt="" class="wp-image-1408" height="300" src="https://dberkholz.files.wordpress.com/2024/01/16198ec8-639a-4c64-bba6-34bb4f0a3783.jpeg?w=300" width="300" /></a></figure>
<p>Although I’m not a smoker, I was missing three of the other factors. My weight was massively unhealthy, I didn’t exercise at all and spent most of my day in front of a desk, and my diet was awful. I do drink moderately, however (almost entirely beer).</p>
<p>This post accompanies my earlier writeup, “<a href="https://dberkholz.com/2024/01/17/the-lazy-technologists-guide-to-weight-loss/">The lazy technologist’s guide to weight loss</a>.” Check that out for an in-depth, science-driven review of my experience losing weight. </p>
<p><strong>Why is this the lazy technologist’s guide, again?</strong> I wanted to lose weight in the “laziest” way possible — in the same sense that lazy programmers find the most efficient solutions to problems, according to an apocryphal quote by Bill Gates and a real one by Larry Wall, creator of Perl. Gates supposedly said, “I choose a lazy person to do a hard job. Because a lazy person will find an easy way to do it.” Wall wrote in <em>Programming Perl</em>, “Laziness: The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful and document what you wrote so you don’t have to answer so many questions about it.”</p>
<p>What’s the lowest-effort, most research-driven way to become fit as quickly as possible, during and after losing weight? Discovering and executing upon that was my journey. Read on if you’re considering taking a similar path.</p>
<h3 class="wp-block-heading">Cardio Fitness</h3>
<p>My initial goal for fitness was simply to meet the “30+ min/day” factor in the research study I cited at the beginning of this post, while considering a few factors:</p>
<ul>
<li>First, this is intended to be the lazy way, so there should be no long and intense workouts unless unavoidable. </li>
<li>Second, I did not want to buy a bunch of equipment or need to pay for a gym membership. Any required equipment should be inexpensive and small.</li>
<li>Third, I wanted to avoid creating any joint issues that would affect me negatively later in life. I was particularly concerned about high-impact, repetitive stress from running on hard surfaces, which I’d heard could be problematic.</li>
</ul>
<p>Joint issues become very common for older people, especially knees and hips. My program needed to avoid any high-impact, repetitive stress on those joints to preserve maximum function. I’ve always heard that running is bad on your knees, but after I looked into it, the <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9983113/">research</a> <a href="https://pubmed.ncbi.nlm.nih.gov/30879445/">does</a> <a href="https://pubmed.ncbi.nlm.nih.gov/37555313/">not</a> bear that out. And yet, it remains a popular misconception among both the general population as well as doctors who do not frequently perform hip replacements.</p>
<p>However, I just don’t like running — I enjoy different activities if I’m going to be working hard physically, such as games like racquetball/squash/pickleball or self-defense (Krav Maga!). I’m also not a big fan of getting all sweaty in general, but especially in the middle of a workday. So I wanted an activity with a moderate rather than high level of exertion.</p>
<p>Low-impact options include walking, cycling, swimming, and rowing, among others. But swimming requires an indoor pool or year-round good weather, and rowing requires a specialized machine or boat, while I’m aiming to stay minimal. I also do not own a bicycle, nor is the snowy weather in Minnesota great for cycling in the winter (fat-tire bikes being an exception).</p>
<p>We’re left with <strong>walking</strong> as the primary activity. </p>
<h4 class="wp-block-heading">LISS — Low-Intensity Steady State</h4>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/02/17e8c1d0-5877-4154-a59c-7bb6ef8c3ada.jpeg"><img alt="" class="wp-image-1427" height="300" src="https://dberkholz.files.wordpress.com/2024/02/17e8c1d0-5877-4154-a59c-7bb6ef8c3ada.jpeg?w=300" width="300" /></a></figure>
<p>Initially, I started with only walking. This is called low-intensity steady state (LISS) cardio (cardiovascular, a.k.a. aerobic) exercise. Later, I also incorporated high-intensity interval training (HIIT) as the laziest possible way to further improve my cardiovascular health.</p>
<p>To bump walking up into a “moderate” level of activity, I need to walk between 3–4 mph. This is what’s sometimes called a “brisk” walk — 3 mph feels fast, and 4 mph is about as fast as I can go without changing into some weird competitive walking style.</p>
<p>I also need to hit 30+ minutes per day of this brisk walking. At first, I started on a <a href="https://www.amazon.com/gp/product/B0BF4SBYGX/">“walking pad” treadmill</a> under my standing desk, which I bought for <$200 on Amazon. My goal was to integrate walking directly into my day with no dedicated time, and this seemed like a good path. However, this violates the minimalism requirement. I also learned that the pace is also too fast to do much of anything at the desk besides watch videos or browse social media. So I broke this up into two 1-mile outdoor walks, one after lunch and another after dinner. </p>
<p>Each 1-mile walk takes 15–20 minutes. Fitting this into a workday requires me to block off 45–60 minutes for lunch, between lunch prep, time to eat, and the walk itself. I find this much easier than trying to create a huge block of time in the morning for exercise, because I do not naturally wake up early. In the evening, I’ll frequently extend the after-dinner walk to ~2 miles instead of 1 mile.</p>
<p>It turns out that walking after meals is a great strategy for both <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3119587/">weight loss</a> and suppressing your blood sugar levels, among other <a href="https://www.healthline.com/nutrition/walking-after-eating#benefits">benefits</a>. This can be as short as a <a href="https://www.cnn.com/2022/09/02/health/walking-blood-sugar-study-wellness/index.html">2-minute walk</a>, according to recent studies. In fact, it’s seen as so key in Mediterranean culture that walking is considered a component of the Mediterranean diet.</p>
<p>Overall, I’ve increased my active calorie consumption by <strong>250 calories/day</strong> by incorporating active walks into my day. That’s a combination of the 2 after-meal brisk walks, plus a more relaxed walk on my under-desk treadmill sometime during the day. The latter is typically a 2 mph walk for 40–60 min, and I do it while I’m in a meeting that I’m not leading, or maybe watching a webinar. Without buying the walking pad, you could do the same on a nice outdoor walk with a headset or earbuds, but Minnesota weather sometimes makes that miserable. Overall, all of this typically gets me somewhere between 10,000–15,000 steps per day. </p>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/02/b1d5cfb4-40b5-4270-b436-f0a8c48df4c3.jpeg"><img alt="" class="wp-image-1428" height="300" src="https://dberkholz.files.wordpress.com/2024/02/b1d5cfb4-40b5-4270-b436-f0a8c48df4c3.jpeg?w=300" width="300" /></a></figure>
<p>Not only is this good for fitness, it also helps to offset the <a href="https://pubmed.ncbi.nlm.nih.gov/33677461/">effects of metabolic adaptation</a>. If you’re losing weight, your body consumes fewer calories because it decreases your resting metabolic rate to conserve energy. Although some sites will suggest this could be hundreds of calories daily, which is quite discouraging, <a href="https://scholar.google.com/scholar?hl=en&as_sdt=0,5&q=metabolic+adaptation+weight+loss&btnG=">research</a> shows that’s exaggerated for most people. During active weight loss, it’s typically ~100 calories per day, although it may be up to 175±150 calories for <a href="https://www.sciencedirect.com/science/article/pii/S0002916522003276">diet-resistant people</a>. That range is a standard deviation, so people who are in the worst ~15% of the diet-resistant subset could have adaptations >325 calories/day. So if you believe you’re diet-resistant, you probably want to aim for a 1000-calorie deficit, to ensure you’re able to lose weight at a good rate. On the bright side, that adaptation gets cut in half once you’ve stabilized for a few weeks at your new weight, and it’s effectively back to zero a year later.</p>
<p>To further <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6489119/">maintain my muscle</a> following weight loss, I added a weighted vest to my after-lunch walks occasionally (examples: <a href="https://www.roguefitness.com/rogue-plate-carrier">Rogue</a>, <a href="https://www.roguefitness.com/5-11-tactec-trainer-weight-vest">5.11</a>, <a href="https://www.trxtraining.com/products/trx-weight-vest">TRX</a>). I started doing this once a week, and I aim to get to 3x+/week. I use a 40 lb weighted vest to counterbalance the 40+ lb of weight that I’ve lost. When I walk with the vest, I’m careful to maintain the same pace as without the vest, which increases the intensity and my heart rate. This pushes a normal moderate-intensity walk into the low end of high intensity (approaching 80% of my max heart rate). I also anticipate incorporating this weighted vest into my strength training later, once my own body weight is insufficient for continued progression. </p>
<p>Considering a minimalist approach, however, I think you could do just fine without a weighted vest. There are other ways to increase intensity, such as speed or inclines, and the combination of a high-protein diet, HIIT, and strength training provides similar benefits.</p>
<h4 class="wp-block-heading">HIIT — High-Intensity Interval Training</h4>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/02/4e70d0f9-1d73-447b-ad3a-13573508b8a3.jpeg"><img alt="" class="wp-image-1430" height="300" src="https://dberkholz.files.wordpress.com/2024/02/4e70d0f9-1d73-447b-ad3a-13573508b8a3.jpeg?w=300" width="300" /></a></figure>
<p>Why do HIIT? Regularly getting your heart rate close to its maximum is good for your cardiovascular health, and you can’t do it with LISS, which by definition is low intensity. Another option besides HIIT is much longer moderate-intensity continuous training (your classic aerobic workout), but HIIT can fit the same benefits or more into a fraction of the time.</p>
<p>Research is very supportive of HIIT compared to longer aerobic workouts, which enables time compression of the total workout length from the classic 60 minutes down to 30 minutes or less. </p>
<p>However, 30 minutes still isn’t the least you can do and still get most of the benefits. The minimum required HIIT remains unclear — in overall length, weekly frequency, as well as patterns of high-intensity and rest / low-intensity. Here are some examples of research that test the limits of minimalist HIIT and find that it still works well:</p>
<ul>
<li>1x 4 min, 3x/wk: <a href="https://physoc.onlinelibrary.wiley.com/doi/epdf/10.1113/JP281210">review</a></li>
<li>5x 1 min, 2x/wk: <a href="https://www.nature.com/articles/s41598-021-82372-4">study</a>, <a href="https://translational-medicine.biomedcentral.com/articles/10.1186/s12967-020-02592-6">study</a>, <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9565952/">study</a></li>
<li>4x 1 min, 3x/wk: <a href="https://psycnet.apa.org/record/2019-04263-008">study</a></li>
<li>8x 20 sec, 4x/wk (i.e. the <a href="https://jps.biomedcentral.com/articles/10.1007/s12576-019-00676-7">Tabata protocol</a>): <a href="https://www.scirp.org/journal/paperinformation?paperid=39842">study</a></li>
</ul>
<p>Yes, you read that right — the last study used 20-second intervals. They were only separated by 10 seconds of rest, so the primary exercise period was just 4 minutes, excluding warm-up. Furthermore, this <a href="https://cdnsciencepub.com/doi/abs/10.1139/apnm-2023-0329?journalCode=apnm">meta-analysis</a> suggests that HIIT benefits more from increasing the intensity of the high-intensity intervals, rather than increasing the volume of repetitions.</p>
<p>After my investigation, it was clear that “low-volume” or “extremely low volume” HIIT could work well, so there was no need to do the full 30-minute HIIT workouts that are popular with many gym chains. </p>
<p>I settled on 3 minutes of HIIT, 2x/week: 3 repetitions of 30 seconds hard / 30 seconds light, plus a 1-minute warm-up. This overlaps with the HIIT intervals, breaks, and repetitions from the research I’ve dug into, and it also has the convenient benefit of not quite making me sweat during the workout, so I don’t need to change clothes. </p>
<p>I’m seeing the benefits of this already, which I’ll discuss in the Summary.</p>
<h3 class="wp-block-heading">Strength Training</h3>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/02/0a3a6454-f04f-45d1-8f10-8bb074179a7b.jpeg"><img alt="" class="wp-image-1432" height="300" src="https://dberkholz.files.wordpress.com/2024/02/0a3a6454-f04f-45d1-8f10-8bb074179a7b.jpeg?w=300" width="300" /></a></figure>
<p>I also wanted to incorporate strength training for many reasons. In the short term, it was to minimize muscle loss as I lost weight (addressed in my <a href="https://dberkholz.com/2024/01/17/the-lazy-technologists-guide-to-weight-loss/">prior post</a>). In the medium and long term, I want to build muscle now so that I can live a healthier life once I’m older and also feel better about myself today.</p>
<p>What I’ve found is that aiming for the range of 10%–15% body fat is ideal for men who want to be very fit. This range makes it easy to tell visually when you’re at the top or bottom of the range, based on the appearance of a well-defined six-pack or its fading away to barely visible. It gets harder to tell where you are visually from 15% upwards, while anything below 10% has some health risks and starts to look pretty unusual too.</p>
<p>Within that 10%–15% range, I’m planning to do occasional short-term “lean bulks” / “<a href="https://www.healthline.com/nutrition/clean-bulk">clean bulks</a>” and “cuts.” That’s the typical approach to building muscle — you eat a slight excess of calories while ensuring plenty of protein, aiming to gain about 2–4 lbs/month for someone my size. After a cycle of doing this, you then “cut” by dieting to lose the excess fat you’ve gained, because it’s impossible to only gain muscle. My personal preference is to make this cycle more agile with shorter iteration cycles, compared to some of the examples I’ve seen. I’m thinking about a 3:1 bulk:cut split over 4 months that results in a total gain/loss of ~10 lbs.</p>
<h4 class="wp-block-heading">Calisthenics (bodyweight exercises): the minimalist’s approach</h4>
<p>My goal of staying minimal pushed me toward calisthenics (bodyweight exercises), rather than needing to work out at a gym or buy free weights. This means the only required equipment is a doorway <a href="https://www.amazon.com/dp/B09BDHV7FW/">pull-up bar</a> ($25), while everything else can be done with a wall, table or chair/bench. Although I may not build enormous muscles, it’s possible to get to the point of lifting your entire body weight with a single arm, which is more than good enough for me. That’s effectively lifting 2x your body weight, since you’re lifting 1x with just one arm.</p>
<p>My routine is inspired by Reddit’s <a href="https://www.reddit.com/r/bodyweightfitness/">r/bodyweightfitness</a> (including the <a href="https://www.reddit.com/r/bodyweightfitness/wiki/kb/recommended_routine/">Recommended Routine</a> and the <a href="https://www.reddit.com/r/bodyweightfitness/wiki/minroutine/">Minimalist Routine</a>) and this <a href="https://stevenlow.org/the-fundamentals-of-bodyweight-strength-training/">blog post</a> by Steven Low, author of the book “Overcoming Gravity.” I’ve also incorporated scientific research wherever possible to guide repetitions and frequency. Overall, the goal is to get both horizontal and vertical pushing and pulling exercises for the arms/shoulders due to their larger range of motion, while getting push and pull for legs, and good core exercises that cover both the upper and lower back as well. </p>
<p>I’ve chosen <strong>compound exercises</strong> that work many muscles simultaneously — for practicality (more applicable to real-world motions), length of workout, and minimal equipment needs. If you’re working isolated muscles, you generally need lots of specialized machines at a gym. Isometrics (exercises where you don’t move, like a wall-sit) are also less applicable to real use cases as you age, such as the strength and agility to catch yourself from a fall. For that reason, I prefer compound exercises with some rapid, explosive movements that help to build both strength and agility.</p>
<h4 class="wp-block-heading">My initial routine</h4>
<p>Here’s my current schedule (3 sets of repetitions for each movement, with a <a href="https://www.bodybuilding.com/content/what-is-the-optimal-time-between-sets-for-muscle-growth.html">3-minute break</a> between sets):</p>
<ul>
<li><strong>Monday</strong>: arm push — push-ups (as HIIT) and tricep dips. “As HIIT” means that I’ll do as many push-ups as I can fit within my HIIT pattern, then flip to my active work (e.g. jumping jacks or <a href="https://www.healthline.com/health/how-to-do-a-burpee">burpees</a>).</li>
<li><strong>Tuesday</strong>: arm pull — pull-ups (with L-sit, as below) and <a href="https://www.healthline.com/health/fitness/australian-pull-up">inverted rows</a> (“Australian pull-ups”)</li>
<li><strong>Wednesday</strong>: core — <a href="https://gmb.io/l-sit/">L-sits</a>, planks (3x — <a href="https://www.businessinsider.com/sports-scientist-says-there-is-no-point-in-holding-plank-for-long-time-2018-3">10 sec</a> on each of front, right, left)</li>
<li><strong>Thursday</strong>: handstands — working toward <a href="https://gmb.io/handstand-push-up/">handstand push-ups</a> as the “vertical push”</li>
<li><strong>Friday</strong>: legs — squats (as HIIT), and Nordic curls (hamstrings & lower back)</li>
<li><strong>Saturday/Sunday</strong>: rest — just walking. Ideally hitting 10k steps/day but no pressure to do so, if I’m starting to feel sore.</li>
</ul>
<p>For ones that I couldn’t do initially (e.g. pull-ups, handstands, L-sits, Nordic curls), I used progressions to work my way there step by step. For pull-ups, that meant doing negatives / eccentrics by jumping up and slowly lowering myself down over multiple seconds, then repeating. For handstands, I face the wall to encourage better posture, so it’s been about longer holds and figuring out how to bail out so I can more confidently get vertical. For L-sits, I follow <a href="https://gmb.io/l-sit/">this progression</a>. For Nordic curls, I’m doing slow negatives as far down as I can make it, then dropping the rest of the way onto my hands and pushing back up.</p>
<p>On days with multiple exercises for the same muscles, I’ll typically try to split them up so they fit more easily into a workday. For example, I’ll find 10 minutes mid-morning between meetings/calls to do one movement and 10 minutes mid-afternoon for the other. This is the same time I might’ve spent making a coffee, before I started focusing on fitness.</p>
<p>Combined with the walks, this plan gets me moving 4 times a day — two 20-minute walks and two 10-minute workouts, for a total of 1 hour each day. The great thing about this approach is that I never feel like I need to dedicate a ton of time to exercise, because it fits naturally into the structure of my day. I’ve also got an additional 40–60 minutes of slow walking while at my desk, which again fits easily into my day.</p>
<h4 class="wp-block-heading">What I’ve learned along the way</h4>
<figure class="wp-block-image size-large"><a href="https://dberkholz.files.wordpress.com/2024/02/the_more_you_know.gif"><img alt="" class="wp-image-1434" height="185" src="https://dberkholz.files.wordpress.com/2024/02/the_more_you_know.gif?w=305" width="305" /></a></figure>
<p>As you can see, I’m currently at 1x/wk for non-core exercises, which is a “traditional split.” That means I’m splitting up exercises, focusing on just one set of muscles each day. The problem is that the frequency of training for each muscle group is low, which I’d like to change so that I can build strength more quickly. </p>
<p>I’m switching to “paired sets” (aka “alternating sets”) that alternate among different muscle groups, so I can fit more into the same amount of time. Here’s how that works: if you were taking a 3-minute rest between sets, that gives you time to fit in an unrelated set of muscles that you weren’t using in the first exercise (e.g. biceps & triceps, quads & hamstrings, chest & back). I do this as an alternating tri-set (arm pull, arm push, legs) with a 30–45 second rest between each muscle group, and a 1.5–2 minute break between each full tri-set. You might also see “supersets,” which is a similar concept but with no breaks within the tri-set. I’ve found that I tend to get too tired and sloppy if I try a superset, so I do alternating sets instead.</p>
<p>In addition, I’ve done a lot more research on strength training after getting started. For LISS and HIIT, I had a strongly research-driven approach before beginning. For strength training, I went with some more direct recommendations and only did additional academic research later. Here’s what I’ve learned since then:</p>
<ul>
<li>Higher-load (80%+), multi-set workouts 2x/week are <a href="https://bjsm.bmj.com/content/57/18/1211.abstract">optimal</a> for maximizing both strength and hypertrophy, according to a 2023 meta-analysis.</li>
<li>One ideal size of a set to maximize benefits seems to be 6-8 repetitions, with a <a href="https://www.bodybuilding.com/content/what-is-the-optimal-time-between-sets-for-muscle-growth.html">3-minute break</a> between sets to maximize energy restoration. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7927075/">6-8 reps seems like a sweet spot</a> between strength and hypertrophy (muscle size). For endurance, 15+ repetitions should be the goal. If you want to build all of those characteristics, you should probably alternate rep counts with different loads.</li>
<li><a href="https://link.springer.com/article/10.1007/s40279-021-01490-1">Time efficient workout design</a>: Use compound exercises and include both concentric & eccentric movements. Perform a minimum of one leg-pressing exercise (e.g. squats), one upper-body pulling exercise (e.g. pull-up) and one upper-body pushing exercise (e.g. push-up). Perform a minimum of 4 weekly sets per muscle group using a 6–15 rep max loading range.</li>
<li><a href="https://link.springer.com/article/10.1007/s00421-022-05035-w">Eccentric / negatives are superior to concentric</a>. Don’t neglect or rush through the negatives / eccentrics. That’s the part of an exercise you ignore by default — letting your weight come down during a squat, pull-up, or push-up rather than when you’re pushing/pulling it back up. Take your time on that part, because it’s actually <strong>more</strong> important.</li>
<li>Doing something as quick as <a href="https://onlinelibrary.wiley.com/doi/10.1111/sms.14138">3-second negatives</a>, 4x/wk, will improve strength.</li>
</ul>
<p>Overall, that suggests a workout design that looks like this (2 days a week):</p>
<ul>
<li>2+ sets of each: Compound exercises for arm push, arm pull, leg press</li>
<li>Aim for whatever difficulty is required to max out at 6–8 repetitions for strength & hypertrophy (muscle size), or up to 15 if you’re focusing on endurance</li>
<li>Do slow eccentrics / negatives on every exercise</li>
</ul>
<h4 class="wp-block-heading">The new routine</h4>
<p>To incorporate this research into a redesigned routine that also includes HIIT and core work, here’s what I’ve recently changed to (most links go to “progressions” that will help you get started):</p>
<ul>
<li><strong>Monday</strong>: Strength: push-ups, <a href="https://gmb.io/pull-ups/">pull-ups</a>, <a href="https://gmb.io/pistol-squat/">squats</a> as alternating set</li>
<li><strong>Tuesday</strong>: HIIT (<a href="https://www.healthline.com/health/how-to-do-a-burpee">burpees</a>, mountain climbers, star jumps, etc)</li>
<li><strong>Wednesday</strong>: Core & Flexibility: <a href="https://gmb.io/l-sit/">L-sits</a>, planks, Nordic curls, stretches</li>
<li><strong>Thursday</strong>: HIIT (similar routine)</li>
<li><strong>Friday</strong>: Strength: <a href="https://gmb.io/handstand-push-up/">handstand push-ups</a>, <a href="https://www.healthline.com/health/fitness/australian-pull-up">inverted rows</a>, squats as alternating set</li>
<li><strong>Saturday/Sunday</strong>: Rest days</li>
</ul>
<p>Also, 4+ days a week, I do a quick set of a 5-second negative for each type of compound exercise (arm push, arm pull, leg press). That’s just 2 days in addition to my strength days, so I usually fit it into HIIT warm-up or cool-down.</p>
<p>On each day, my overall expected time commitment will be about <strong>10 minutes</strong>. For strength training, all the alternating sets will overlap with each other. Even with a 3-min break between each set for the same muscle group, that should run quite efficiently for 2–3 sets. For HIIT, it’s already a highly compressed routine that takes ~5 minutes including warm-up and cool-down, but I need another 5 minutes afterwards to decompress after exercise that intense. You may notice that I only have one dedicated day to work my core (Wednesday), but I’m also getting core exercise during push-ups (as I plank), L-sit pull-ups, and handstands (as I balance).</p>
<p>The research recommendation to increase load to 80% of your max can seem more challenging with calisthenics, since it’s just about bodyweight. However, it’s always possible by decreasing your leverage, using one limb instead of two, or increasing the proportion of your weight that’s applied by changing your body angles. For example, you can do push-ups at a downwards incline with your feet on a bench/chair. You can also do more advanced types of squats like Bulgarian split squats, shrimp squats, or pistol squats.</p>
<h3 class="wp-block-heading">Summary</h3>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/01/eeadcb95-2794-4e0d-9609-94c53e85b9cd.jpeg"><img alt="" class="wp-image-1411" height="300" src="https://dberkholz.files.wordpress.com/2024/01/eeadcb95-2794-4e0d-9609-94c53e85b9cd.jpeg?w=300" width="300" /></a></figure>
<p>My cardiorespiratory fitness, as measured by VO2 Max (maximal oxygen consumption) on my Apple Watch, has increased from 32 (the lowest end of “below average,” for my age & gender) to 40.1 (above average). It continues to improve on a nearly daily basis. That’s largely happened within just a <strong>couple of months</strong>, since I started walking every day and doing HIIT. </p>
<p>My blood pressure (one of my initial concerns) has dropped out of pre-hypertension into the healthy range. My resting heart rate has also decreased from 63 to 56 bpm, which was a long slow process that’s occurred over the entire course of my weight loss.</p>
<p>On the strength side, I wasn’t expecting any gains because I’m in a caloric deficit. My main goal was to avoid losing muscle while losing weight. I’ve now been strength training for 2.5 months, and <strong>I’ve been pleasantly surprised by the “newbie gains”</strong> (which people often see in their first year or two of strength training). </p>
<p>For example, I couldn’t do any pull-ups when I started. I could barely do a couple of negatives, by jumping up and letting myself down slowly. Now I can do 4 pull-ups (neutral grip). Also, I can now hold a wall handstand for 30–45 seconds and do 6–8 very small push-ups, while I could barely get into that position at all when I started. </p>
<p>Overall, clear results emerged almost instantly for cardiorespiratory fitness, and as soon as 6 weeks after beginning a regular strength-training routine. <strong>If you try it out, let me know how it works for you!</strong></p>2024-02-16T20:20:04+00:00Donnie Berkholz: The lazy technologist’s guide to weight loss
https://dberkholz.com/2024/01/17/the-lazy-technologists-guide-to-weight-loss/
<p><em>[Last update: 2024-02-16]</em></p>
<p><strong>In the past 8 months, I’ve lost 60 pounds</strong> and went from completely sedentary to becoming much more fit, while putting in a minimum of effort. I have no desire to be a bodybuilder, but I want to be in great shape now and be as healthy and mobile as possible well into my old age. A year ago, my blood pressure was already at pre-hypertension levels, despite being at a relatively young age. </p>
<p>I wasn’t willing to let this last any longer, and I wasn’t willing to accept that future.</p>
<p><a href="https://www.ahajournals.org/doi/10.1161/CIRCULATIONAHA.117.032047">Research shows</a> that 5 factors are key to a long life — correlated with extending your life by 12–14 years:</p>
<ul>
<li>Never smoking</li>
<li>BMI (body mass index) of 18.5–24.9</li>
<li>30+ min a day of moderate/vigorous exercise</li>
<li>Moderate alcohol intake (vs none, occasional, or heavy)</li>
<li>Diet quality in the upper 40% (Alternate Healthy Eating Index)</li>
</ul>
<p>In addition, people who are in good health have a much shorter end-of-life period. This means they extend the healthy portion of their lifespan (the “<a href="https://www.nature.com/articles/s41536-021-00169-5">healthspan</a>”) and compress the worst parts into a shorter period at the very end. Having seen many grandparents go through years of struggle as they grew older, I wanted my own story to have a different ending.</p>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/01/16198ec8-639a-4c64-bba6-34bb4f0a3783.jpeg"><img alt="" class="wp-image-1408" height="300" src="https://dberkholz.files.wordpress.com/2024/01/16198ec8-639a-4c64-bba6-34bb4f0a3783.jpeg?w=300" width="300" /></a></figure>
<p>Although I’m not a smoker, I was missing three of the other factors. My weight was massively unhealthy, I didn’t exercise at all and spent most of my day in front of a desk, and my diet was awful. On the bright side for these purposes, I drink moderately (almost entirely beer).</p>
<p>In this post, I’ll walk through my own experience going from obese to a healthy weight, with plenty of research-driven references and data along the way.</p>
<p><strong>Why is this the lazy technologist’s guide, though?</strong> I wanted to lose weight in the “laziest” way possible — in the same sense that lazy programmers find the most efficient solutions to problems, according to an apocryphal quote by Bill Gates and a real one by Larry Wall, creator of Perl. Gates supposedly said, “I choose a lazy person to do a hard job. Because a lazy person will find an easy way to do it.” Wall wrote in <em>Programming Perl</em>, “Laziness: The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful and document what you wrote so you don’t have to answer so many questions about it.”</p>
<p>What’s the lowest-effort, most research-driven way to lose weight as quickly as possible without losing health? Discovering and executing upon that was my journey. Read on if you’re considering taking a similar path.</p>
<h3 class="wp-block-heading">My weight-loss journey begins</h3>
<p>My initial goal was to get down from 240 pounds (obese, BMI of 31.7) into the healthy range, reaching 185 pounds (BMI of 24.4). </p>
<p>My aim was to lose at the high end of a healthy rate, 2 pounds per week. Credible sources like the <a href="https://www.mayoclinic.org/healthy-lifestyle/weight-loss/in-depth/weight-loss/art-20047752">Mayo Clinic</a> and the <a href="https://www.cdc.gov/healthyweight/losing_weight/index.html">CDC</a> suggested aiming for 1–2 pounds a week, because anything beyond that can cause issues with muscle loss as well as malnutrition.</p>
<p>But how could I accomplish that?</p>
<h3 class="wp-block-heading">One weird trick — Eat less</h3>
<p>I’ve lost weight once previously (about 15 years ago), although it was a smaller amount. Back then, I learned that there’s no silver bullet — the trick is to create a calorie deficit, so that your body consumes more energy than the calories in what you eat. </p>
<p>Every pound is about 3500 calories, which helps to set a weekly and daily goal for your calorie deficit. For me to lose 2 pounds a week, that’s 2*3500 = 7000 calories/week, or 1000 calories/day of deficit (eating that much less than my body uses).</p>
<h3 class="wp-block-heading">Exercise barely makes a dent</h3>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/01/97ec02c7-64d5-4bca-9775-3149339dd983.jpeg"><img alt="" class="wp-image-1409" height="300" src="https://dberkholz.files.wordpress.com/2024/01/97ec02c7-64d5-4bca-9775-3149339dd983.jpeg?w=300" width="300" /></a></figure>
<p>It’s far more effective and efficient to create this deficit primarily through eating less rather than expecting exercise to make a huge difference. If you were previously gaining weight, you might’ve been eating 3000 calories/day or more! You can easily reduce what you eat by 1500 calories/day from that starting point, but it’s almost impossible to exercise enough to burn that many calories. An hour of intense exercise might burn 500 calories, and it’s very hard to keep up that level of effort for even one full hour — especially if you’ve been sitting in a chair all day for years on end.</p>
<p>Not to mention, that much exercise would defeat the whole idea of this being the lazy person’s way of making progress.</p>
<p>So how exactly can you reduce calories? You’ve got a lot of options, but they basically boil down to two things — <strong>eat less (portion control), and eat better (food choice)</strong>.</p>
<h3 class="wp-block-heading">The plan</h3>
<p>At this point, I knew I needed to eat 1000 calories/day less than I burned. I used this <a href="https://www.calculator.net/calorie-calculator.html">calculator</a> to identify that, as a sedentary person, I burned about 2450 calories/day. So to create that deficit, I needed to eat about 1450 calories/day. At that point, I was probably eating 2800–3000 calories/day, so that would require massive changes in my diet.</p>
<p>I don’t like the idea of fad diets that completely remove one or many types of foods entirely (Atkins, keto, paleo, etc), although they can work for other people. One of those big lessons about dieting is that as long as you’re removing <strong>something</strong> from what you eat, you’ll probably lose weight. </p>
<p>I decided to make two big changes: how often I ate healthy vs unhealthy food, and when I ate over the course of the day. At the time, I was eating a huge amount of high-fat, high-sugar, and low-health foods like burgers and fries multiple times per week, fried food, lots of chips/crisps, white bread (very high sugar in the US) & white rice, cheese, chocolate and candy. </p>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/01/b4248fd8-2efc-46e0-aa9f-e94f9afe8831.jpeg"><img alt="" class="wp-image-1402" height="300" src="https://dberkholz.files.wordpress.com/2024/01/b4248fd8-2efc-46e0-aa9f-e94f9afe8831.jpeg?w=300" width="300" /></a></figure>
<p>I decided to shift that toward white meat (chicken/pork/turkey), seafood, salads & veggies, and whole grains (whole-wheat bread, brown rice, quinoa, etc). One pro-tip: American salad dressings are super unhealthy, often even the “vinaigrettes” that sound better. <a href="https://www.lacucinaitaliana.com/italian-food/hacks/how-to-dress-a-salad-italian-style">Do like Italians do</a>, and dress salads yourself with olive oil, salt, and vinegar. However, I didn’t want to remove my favorite foods entirely, because that would destroy my long-term motivation and enjoyment of my progress. For example, once a week, I still allow myself to get a cheeseburger. But I’ll typically get a single patty, no mayo/cheese/ketchup, and with a side like salad (w/ healthy dressing) or cole slaw. I’ll also ensure my other meal of the day is very light. Many days, I’ll enjoy a small treat like 1–2 chocolates, as well (50–100 calories).</p>
<h3 class="wp-block-heading">What if you like beer?</h3>
<p>I wanted to reach my calorie target without eliminating beer, so I could both preserve my quality of life and also maintain the moderate drinking that research shows is correlated with increased lifespan. </p>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/01/695880dc-4ba1-450e-937b-70cf52a9d955.jpeg"><img alt="" class="wp-image-1404" height="300" src="https://dberkholz.files.wordpress.com/2024/01/695880dc-4ba1-450e-937b-70cf52a9d955.jpeg?w=300" width="300" /></a></figure>
<p>I was also drinking very high-calorie beer (like double IPAs and bourbon-barrel–aged imperial stouts). I shifted that toward low-alcohol, low-calorie beer (alcohol levels and calories are correlated). Bell’s Light-Hearted IPA and Lagunitas DayTime IPA are two pretty good ones in my area. Of the non-alcoholic (NA) beers, Athletic Free Wave Hazy IPA is the best I’ve found in my area, but Untappd has reasonably good ratings for Sam Adams Just the Haze and Sierra Nevada Trail Pass IPA, which should be broadly available. As a rough estimate on calories in beer, you can use this formula:</p>
<p class="has-text-align-center"><em>Beer calories = ABV (alcohol percentage) * 2.5 * fluid ounces</em></p>
<p>As an exception, many Belgian beers are quite “efficient” to drink, in that roughly 75% of the calories are alcohol rather than other carbs that just add calories. As a result, they violate the above formula and tend to be lower-calorie than you’d expect. This could be the result of carefully crafted recipes that consume most of the carbs, and fermentation that uses up all of the sugar. </p>
<p>Here’s a more specific formula that you can use, if you’re curious about how “efficient” a given beer is, and you know how many total calories it has (find this online):</p>
<p class="has-text-align-center"><em>Beer calories from ethanol = (ABV * 0.8 / 100) * (29.6 * fluid ounces) * 7</em></p>
<p class="has-text-align-center"><em>(Simplified form): Beer calories from ethanol = ABV * 1.7 * fluid ounces</em></p>
<p>This uses the density and calories of ethanol (0.8 g/ml and 7 cal/g, respectively) and converts from milliliters to ounces (29.6 ml/oz). If you then calculate that number as a fraction of the total calories in a beer, you can find its “efficiency.” For example, a 12-ounce bottle of 8.5% beer might have 198 calories total. Using the equation, we can calculate that it’s got 169 calories from ethanol, so 169/198 = 85% “efficient.”</p>
<p>If you’re really trying to optimize for this, however, <a href="https://www.getdrunknotfat.com/">beer is the wrong drink</a>. Have a low-calorie mixed drink instead, like a vodka soda, ranch water, or rum and Diet Coke.</p>
<h3 class="wp-block-heading">The plan (part 2)</h3>
<p>Therefore, instead of giving up beer entirely, I decided to skip breakfast. I’d eaten light breakfasts for years (a small bowl of cereal, or a banana and a granola bar), so this wasn’t a big deal to me. </p>
<p>Later, I discovered this qualified my diet as time-restricted intermittent fasting as well, since I was only eating/drinking between ~12pm–6pm. This approach of 18 hours off / 6 hours on (18:6 fasting) <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7021351/">may have aided</a> in my weight loss, but studies are mixed with some suggesting <a href="https://www.ahajournals.org/doi/10.1161/JAHA.122.026484">no effect</a>.</p>
<p>Here’s what a day might look like on 1450 calories:</p>
<ul>
<li>Lunch (400 calories). A tuna-salad sandwich (made with Greek yogurt instead of mayo) on whole-wheat bread, and a side salad with olive oil & vinegar.</li>
<li>Afternoon snack (150 calories). Sliced bell peppers, no dip, and a small bowl of cottage cheese.</li>
<li>A treat (50–100 calories). A truffle or a couple of small chocolates as an afternoon treat.</li>
<li>Dinner (650 calories). Fried chicken/fish sandwich (or kids-size burger) and a small order of fries, from a fast-casual restaurant.</li>
<li>One or two low-alcohol, light, or NA beers (150–200 calories).</li>
</ul>
<p>When I get hungry, I often drink some water instead, because my body’s easily confused about hunger vs thirst. It’s a mental game too — I remind myself that hunger means my body is burning fat, and that’s a good thing.</p>
<p>For a long time, I kept track of my estimated calorie consumption mentally. More recently, I decided to make my life a little easier by switching to an app. I chose MyFitnessPal because it’s got a big database including almost everything I eat.</p>
<p>On this plan, I had a great deal of success in losing my first 40 pounds, getting down from 240 to 200. However, it started to feel like a bit of a struggle to maintain my weight loss as I reached 200 pounds and wanted to continue losing at the same rate of 2 pounds/week.</p>
<h3 class="wp-block-heading">Adaptation, plateaus and persistence</h3>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/01/eeadcb95-2794-4e0d-9609-94c53e85b9cd.jpeg"><img alt="" class="wp-image-1411" height="300" src="https://dberkholz.files.wordpress.com/2024/01/eeadcb95-2794-4e0d-9609-94c53e85b9cd.jpeg?w=300" width="300" /></a></figure>
<p>I fell behind by about two weeks on my weight-loss goal, which was massively frustrating because I’d done so well all along. I convinced myself to keep persisting because it had worked all along for months, and this was a temporary setback.</p>
<p>Finally I re-used the same weight-loss calculator and realized what seemed obvious in hindsight: Since I now weighed less, I also burned fewer calories per day! Those 40 pounds that were now gone didn’t use any energy anymore, but I was still eating as if I had them. I needed to change something to restore the 1000-calorie daily deficit. </p>
<p>At this point, I aimed to decrease my intake to about 1200 calories per day. This quickly became frustrating because it started to affect my quality of life by forcing choices I didn’t want to make, such as choosing between a decent dinner or a beer, or forcing me to eat a salad with no protein for dinner if I had a little bit bigger lunch.</p>
<p>That low calorie limit also carried the risk of causing <a href="https://journals.lww.com/nsca-jscr/fulltext/2022/10000/metabolic_adaptations_to_weight_loss__a_brief.39.aspx">metabolic adaptation</a> — meaning my body could burn hundreds fewer calories per day as a result of being in a “starvation mode” of sorts. That ends up being a vicious cycle that continually forces you to eat less, and it makes weight loss even more challenging.</p>
<p>Consequently, I began to introduce moderate exercise (walking), so I could bring my intake back up to 1400 calories on days when I burned 200 extra calories. I’ve discussed the details in a <a href="https://dberkholz.com/2024/02/05/the-lazy-technologists-guide-to-fitness/">follow-up guide for fitness</a>.</p>
<p>Over the course of my learning, I discovered that it’s ideal (according to actuarial tables) to sit in the middle of the healthy range rather than be at the top of it. I maintained my initial weight-loss goal to keep myself motivated on progress, but set a second goal of reaching 165 pounds — or whatever weight it takes to get a six-pack (~10% body fat).</p>
<h3 class="wp-block-heading">Eat lots of protein</h3>
<p>I also discovered that <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8308821/">high-protein diets are better at preserving muscle</a>, so more of the weight loss is fat. This is especially true when coupled with resistance or strength training, which also sends your body a signal that it needs to keep its muscle instead of losing it. The minimum recommended daily allowance (RDA) of protein (<strong>0.36 grams per pound</strong> of body weight, or 67 g/day for me) could be your absolute lower limit, while as much as 0.6 g/lb (111 g/day for me) could help in improving your muscle mass. </p>
<p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5421125/">Another study</a> suggested <strong>multiplying the RDA by 1.25–1.5</strong> (or more if you exercise) to maintain muscle during weight loss, which would put my recommended protein at 84–100 grams per day. The same study also said exercise helps to maintain muscle during weight loss, so it could be an either/or situation rather than needing both. Additionally, <a href="https://www.healthline.com/nutrition/how-protein-can-help-you-lose-weight">high-protein diets can help with hunger and weight loss</a>, in part because they keep you fuller for longer. Getting <strong>25%–30% of daily calories from protein</strong> will get you to this level, which is a whole lot of protein. Starting from your overall daily calories, you can apply this percentage and then divide your desired protein calories by 4 to get the number of grams per day:</p>
<p class="has-text-align-center"><em>Protein grams per day = Total daily calories * {25%, 30%} / 4</em></p>
<p>For my calorie limit, that’s about 88–105 grams per day. </p>
<p>I’ve found that eating near the absolute minimum recommended protein level (67 grams per day, for my weight) tends to happen fairly naturally with my originally planned diet, while getting much higher protein takes real effort. I needed to identify low-calorie, high-protein foods and incorporate them more intentionally into meals, so that I can get enough protein without compromising my daily calorie limit. </p>
<p>Here’s a <a href="https://www.healthline.com/nutrition/high-protein-diet-plan#7-day-sample-meal-plan">good list of low-calorie, high-protein foods</a> that are pretty affordable:</p>
<ul>
<li>Breakfast/Lunch: eggs or low-fat/nonfat Greek yogurt (with honey/berries), </li>
<li>Entree: grilled/roasted chicken (or pork/turkey) or seafood (especially shrimp, canned salmon, canned tuna), and</li>
<li>Sides: cottage cheese or lentils/beans (including soups, to make it an entree).</li>
</ul>
<figure class="wp-block-image size-medium"><a href="https://dberkholz.files.wordpress.com/2024/01/a99bff35-9236-4de5-befb-82d5c1ae5f21.jpeg"><img alt="" class="wp-image-1406" height="300" src="https://dberkholz.files.wordpress.com/2024/01/a99bff35-9236-4de5-befb-82d5c1ae5f21.jpeg?w=300" width="300" /></a></figure>
<p>If you’re vegetarian, you’d want to go heavier on lentils and beans, and add plenty of nuts, including hummus and peanut butter. You probably also want to bring in tempeh, and you likely already eat tofu.</p>
<p>I’d never tried canned salmon before, and I was impressed with how easily I could make it into a salad or an open-faced sandwich (like Danish smørrebrød). The salmon came in large pieces and retained the original texture, as you’d want. Canned tuna has been more variable in terms of texture — I’ve had some great-looking albacore from Genova and some great-tasting (but not initially good-looking) skipjack from Wild Planet.</p>
<p>Avoid the most common brands of canned fish though, like Chicken of the Sea, StarKist, or Bumble Bee. They are often farmed or net-caught instead of pole/line-caught, and they may be higher in parasites (for farmed fish like salmon). I also aim to buy lower-mercury types of salmon and tuna — this means I can eat each kind of fish as often as I want, instead of once a week. I buy canned Wild Planet <strong>skipjack</strong> tuna (not albacore, but yellowfin is pretty good too) and canned Deming’s <strong>sockeye</strong> salmon (not pink salmon) at my local grocery store, and I pick up large trays of refrigerated cocktail shrimp at Costco. The Genova brand also garners good reviews for canned fish and may be easier to find. All of those are pre-cooked and ready to eat, so they’re easy to use for a quick lunch. </p>
<p>Go ahead and get fresh seafood if you want, but be aware that you’ll be going through a lot of it so it could get expensive. Fish only stays good for a couple of days unless frozen, so you’ll also be making a lot of trips to the store or regularly thawing/cooking frozen fish.</p>
<h3 class="wp-block-heading">Summary</h3>
<p>Over the past 8 months, I’ve managed to lose 60 pounds (and counting!) through a low-effort approach that has minimized the overall impact on my quality of life. I’ve continued to eat the foods I want — but less of them.</p>
<p>The biggest challenge has been persistence through the tough times. However, not cutting out any foods completely, but rather just decreasing the frequency of unhealthy foods in my life, has been a massive help with that. That meant I didn’t feel like I was breaking my whole diet whenever I had something I really wanted, as long as it fit within my calorie limit.</p>
<p>What’s next? A few months after beginning my weight loss, I also started working out to get into better shape, which was another one of those original <a href="https://www.ahajournals.org/doi/10.1161/CIRCULATIONAHA.117.032047">5 factors to a long life</a>. Right now, I’m aiming to get down to about 10% body fat, which is likely to be around 165 pounds. Then I’ll flip my eating habits into muscle-building mode, which will require a slight caloric excess rather than a deficit. </p>
<p>Stay tuned to see what happens!</p>2024-02-16T19:56:32+00:00Melissa Wen: Keep an eye out: We are preparing the 2024 Linux Display Next Hackfest!
https://melissawen.github.io/blog/2024/02/16/stay-tuned-display-hackfest-2024
<p>Igalia is preparing the 2024 Linux Display Next Hackfest and we are thrilled to
announce that this year’s hackfest will take place from May 14th to 16th at our
HQ in A Coruña, Spain.</p>
<p><img alt="" src="https://github.com/melissawen/melissawen.github.io/blob/master/img/orzan-corunha-low.jpg?raw=true" /></p>
<p>This unconference-style event aims to bring together the most relevant players
in the Linux display community to tackle current challenges and chart the
future of the display stack.</p>
<p>Key goals for the hackfest include:</p>
<ul>
<li><strong>Releasing the power of collaboration:</strong> We’ll work to remove bottlenecks
and pave the way for smoother, more performant displays.</li>
<li><strong>Problem-solving powerhouse:</strong> Brainstorming sessions and collaborative
coding will target issues like HDR, color management, variable refresh rates,
and more.</li>
<li><strong>Building on past commitments:</strong> Let’s solidify the progress made in recent
years and push the boundaries even further.</li>
</ul>
<p>The hackfest fosters an intimate and focused environment to brainstorm, hack,
and design solutions alongside fellow display experts. Participants will dive
into discussions, tinker with code, and contribute to shaping the future of the
Linux display stack.</p>
<p>More details are available on <a href="https://events.pages.igalia.com/linuxdisplaynexthackfest/">the official website</a>.</p>
<p>Stay tuned! Keep an eye out for more information, mark your calendars and start
prepping your hacking gear.</p>2024-02-16T17:25:00+00:00Alyssa Rosenzweig: Conformant OpenGL 4.6 on the M1
https://rosenzweig.io/blog/conformant-gl46-on-the-m1.html
<p>For years, the M1 has only supported OpenGL 4.1. That changes today –
with our release of full OpenGL® 4.6 and OpenGL® ES 3.2! <a href="https://fedora-asahi-remix.org/">Install Fedora</a> for the latest
M1/M2-series drivers.</p>
<p>Already installed? Just <code style="white-space: nowrap;">dnf upgrade
--refresh</code>.</p>
<p>Unlike the vendor’s non-conformant 4.1 drivers, our <a href="https://gitlab.freedesktop.org/asahi/mesa">open source</a> Linux
drivers are <strong>conformant</strong> to the latest OpenGL versions,
finally promising broad compatibility with modern OpenGL workloads, like
<a href="https://www.blender.org/">Blender</a>.</p>
<p><a href="https://rosenzweig.io/Blender-Wanderer-high.avif"><img height="993" src="https://rosenzweig.io/Blender-Wanderer.avif" title="Screenshot of Blender running on Apple M1 on Fedora Linux 39. The scene is 'Wanderer', depicting a humanoid in a space suit on a rocky terrain, beside a rover with solar panels." width="1465" /></a></p>
<p>Conformant 4.6/3.2 drivers must pass over 100,000 tests to ensure
correctness. The official list of conformant drivers now includes <a href="https://www.khronos.org/conformance/adopters/conformant-products/opengl#submission_347">our
OpenGL 4.6</a> and <a href="https://www.khronos.org/conformance/adopters/conformant-products/opengles#submission_1045">ES
3.2</a>.</p>
<p>While the vendor doesn’t yet support graphics standards like modern
OpenGL, we do. For this Valentine’s Day, we want to profess our love for
interoperable open standards. We want to free users and developers from
lock-in, enabling applications to run anywhere the heart wants without
special ports. For that, we need standards conformance. Six months ago,
we became the <a href="https://rosenzweig.io/blog/first-conformant-m1-gpu-driver.html">first
conformant driver for any standard graphics API for the M1</a> with the
release of OpenGL ES 3.1 drivers. Today, we’ve finished OpenGL with the
full 4.6… and we’re well on the road to Vulkan.</p>
<hr />
<p>Compared to 4.1, OpenGL 4.6 adds dozens of required features,
including:</p>
<ul>
<li>Robustness</li>
<li>SPIR-V</li>
<li><a href="https://rosenzweig.io/blog/asahi-gpu-part-6.html">Clip control</a></li>
<li>Cull distance</li>
<li><a href="https://rosenzweig.io/blog/first-conformant-m1-gpu-driver.html">Compute
shaders</a></li>
<li>Upgraded transform feedback</li>
</ul>
<p>Regrettably, the M1 doesn’t map well to any graphics standard newer
than OpenGL ES 3.1. While Vulkan makes some of these features optional,
the missing features are required to layer DirectX and OpenGL on top. No
existing solution on M1 gets past the OpenGL 4.1 feature set.</p>
<p>How do we break the 4.1 barrier? Without hardware support, new
features need new tricks. Geometry shaders, tessellation, and transform
feedback become compute shaders. Cull distance becomes a transformed
interpolated value. Clip control becomes a vertex shader epilogue. The
list goes on.</p>
<p>For a taste of the challenges we overcame, let’s look at
<strong>robustness</strong>.</p>
<p>Built for gaming, GPUs traditionally prioritize raw performance over
safety. Invalid application code, like a shader that reads a buffer
out-of-bounds, can trigger undefined behaviour. Drivers exploit that to
maximize performance.</p>
<p>For applications like web browsers, that trade-off is undesirable.
Browsers handle untrusted shaders, which they must sanitize to ensure
stability and security. Clicking a malicious link should not crash the
browser. While some sanitization is necessary as graphics APIs are not
security barriers, reducing undefined behaviour in the API can assist
“defence in depth”.</p>
<p>“Robustness” features can help. Without robustness, out-of-bounds
buffer access in a shader can crash. With robustness, the application
can opt for defined out-of-bounds behaviour, trading some performance
for less attack surface.</p>
<p>All modern cross-vendor APIs include robustness. Many games even
(accidentally?) rely on robustness. Strangely, the vendor’s proprietary
API omits buffer robustness. We must do better for conformance,
correctness, and compatibility.</p>
<p>Let’s first define the problem. Different APIs have different
definitions of what an out-of-bounds load returns when robustness is
enabled:</p>
<ul>
<li>Zero (Direct3D, Vulkan with <code>robustBufferAccess2</code>)</li>
<li>Either zero or some data in the buffer (OpenGL, Vulkan with
<code>robustBufferAccess</code>)</li>
<li>Arbitrary values, but can’t crash (OpenGL ES)</li>
</ul>
<p>OpenGL uses the second definition: return zero or data from the
buffer. One approach is to return the <em>last</em> element of the
buffer for out-of-bounds access. Given the buffer size, we can calculate
the last index. Now consider the <em>minimum</em> of the index being
accessed and the last index. That equals the index being accessed if it
is valid, and some other valid index otherwise. Loading the minimum
index is safe and gives a spec-compliant result.</p>
<p>As an example, a uniform buffer load without robustness might look
like:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb1-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb1-1" tabindex="-1"></a><span class="bu">load</span><span class="op">.</span>i32 result<span class="op">,</span> buffer<span class="op">,</span> index</span></code></pre></div>
<p>Robustness adds a single unsigned minimum (<code>umin</code>)
instruction:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb2-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb2-1" tabindex="-1"></a>umin idx<span class="op">,</span> index<span class="op">,</span> last</span>
<span id="cb2-2"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb2-2" tabindex="-1"></a><span class="bu">load</span><span class="op">.</span>i32 result<span class="op">,</span> buffer<span class="op">,</span> idx</span></code></pre></div>
<p>Is the robust version slower? It can be. The difference should be
small percentage-wise, as arithmetic is faster than memory. With
thousands of threads running in parallel, the arithmetic cost may even
be hidden by the load’s latency.</p>
<p>There’s another trick that speeds up robust uniform buffers. Like
other GPUs, the M1 supports “preambles”. The idea is simple: instead of
calculating the same value in every thread, it’s faster to calculate
once and reuse the result. The compiler identifies eligible calculations
and moves them to a preamble executed before the main shader. These
redundancies are common, so preambles provide a nice speed-up.</p>
<p>We usually move uniform buffer loads to the preamble when every
thread loads the same index. Since the size of a uniform buffer is
fixed, extra robustness arithmetic is <em>also</em> moved to the
preamble. The robustness is “free” for the main shader. For robust
storage buffers, the clamping might move to the preamble even if the
load or store cannot.</p>
<p>Armed with robust uniform and storage buffers, let’s consider robust
“vertex buffers”. In graphics APIs, the application can set vertex
buffers with a base GPU address and a chosen layout of “attributes”
within each buffer. Each attribute has an offset and a format, and the
buffer has a “stride” indicating the number of bytes per vertex. The
vertex shader can then read attributes, implicitly indexing by the
vertex. To do so, the shader loads the address:</p>
<p><img alt="Base plus stride times vertex plus offset" style="display: block; margin: 0 auto;" /></p>
<p>Some hardware implements robust vertex fetch natively. Other hardware
has bounds-checked buffers to accelerate robust software vertex fetch.
Unfortunately, the M1 has neither. We need to implement vertex fetch
with raw memory loads.</p>
<p>One instruction set feature helps. In addition to a 64-bit base
address, the M1 GPU’s memory loads also take an offset in
<em>elements</em>. The hardware shifts the offset and adds to the 64-bit
base to determine the address to fetch. Additionally, the M1 has a
combined integer multiply-add instruction <code>imad</code>. Together,
these features let us implement vertex loads in two instructions. For
example, a 32-bit attribute load looks like:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb3-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb3-1" tabindex="-1"></a>imad idx<span class="op">,</span> stride<span class="op">/</span><span class="dv">4</span><span class="op">,</span> vertex<span class="op">,</span> offset<span class="op">/</span><span class="dv">4</span></span>
<span id="cb3-2"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb3-2" tabindex="-1"></a><span class="bu">load</span><span class="op">.</span>i32 result<span class="op">,</span> base<span class="op">,</span> idx</span></code></pre></div>
<p>The hardware load can perform an additional small shift. Suppose our
attribute is a vector of 4 32-bit values, densely packed into a buffer
with no offset. We can load that attribute in one instruction:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb4-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb4-1" tabindex="-1"></a><span class="bu">load</span><span class="op">.</span>v4i32 result<span class="op">,</span> base<span class="op">,</span> vertex <span class="op"><<</span> <span class="dv">2</span></span></code></pre></div>
<p>…with the hardware calculating the address:</p>
<p><img alt="Base plus 4 times vertex left shifted 2, which equals Base plus 16 times vertex" style="display: block; margin: 0 auto;" /></p>
<p>What about robustness?</p>
<p>We want to implement robustness with a clamp, like we did for uniform
buffers. The problem is that the vertex buffer size is given in bytes,
while our optimized load takes an index in “vertices”. A single vertex
buffer can contain multiple attributes with different formats and
offsets, so we can’t convert the size in bytes to a size in
“vertices”.</p>
<p>Let’s handle the latter problem. We can rewrite the addressing
equation as:</p>
<p><img alt="Base plus offset, which is the attribute base, plus stride times vertex" style="display: block; margin: 0 auto;" /></p>
<p>That is: one buffer with many attributes at different offsets is
equivalent to many buffers with one attribute and no offset. This gives
an alternate perspective on the same data layout. Is this an
improvement? It avoids an addition in the shader, at the cost of passing
more data – addresses are 64-bit while attribute offsets are <a href="https://vulkan.gpuinfo.org/listreports.php?limit=maxVertexInputAttributeOffset&value=4294967295&platform=all0">16-bit</a>.
More importantly, it lets us translate the vertex buffer size in bytes
into a size in “vertices” for <em>each</em> vertex attribute. Instead of
clamping the offset, we clamp the vertex index. We still make full use
of the hardware addressing modes, now with robustness:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb5-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb5-1" tabindex="-1"></a>umin idx<span class="op">,</span> vertex<span class="op">,</span> last valid</span>
<span id="cb5-2"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb5-2" tabindex="-1"></a><span class="bu">load</span><span class="op">.</span>v4i32 result<span class="op">,</span> base<span class="op">,</span> idx <span class="op"><<</span> <span class="dv">2</span></span></code></pre></div>
<p>We need to calculate the last valid vertex index ahead-of-time for
each attribute. Each attribute has a format with a particular size.
Manipulating the addressing equation, we can calculate the last
<em>byte</em> accessed in the buffer (plus 1) relative to the base:</p>
<p><img alt="Offset plus stride times vertex plus format" style="display: block; margin: 0 auto;" /></p>
<p>The load is valid when that value is bounded by the buffer size in
bytes. We solve the integer inequality as:</p>
<p><img alt="Vertex less than or equal to the floor of size minus offset minus format divided by stride" style="display: block; margin: 0 auto;" /></p>
<p>The driver calculates the right-hand side and passes it into the
shader.</p>
<p>One last problem: what if a buffer is too small to load
<em>anything</em>? Clamping won’t save us – the code would clamp to a
negative index. In that case, the attribute is entirely invalid, so we
swap the application’s buffer for a small buffer of zeroes. Since we
gave each attribute its own base address, this determination is
per-attribute. Then clamping the index to zero correctly loads
zeroes.</p>
<p>Putting it together, a little driver math gives us robust buffers at
the cost of one <code>umin</code> instruction.</p>
<hr />
<p>In addition to buffer robustness, we need image robustness. Like its
buffer counterpart, image robustness requires that out-of-bounds image
loads return zero. That formalizes a guarantee that reasonable hardware
already makes.</p>
<p>…But it would be no fun if our hardware was reasonable.</p>
<p>Running the conformance tests for image robustness, there is a single
test failure affecting “mipmapping”.</p>
<p>For background, mipmapped images contain multiple “levels of detail”.
The base level is the original image; each successive level is the
previous level downscaled. When rendering, the hardware selects the
level closest to matching the on-screen size, improving efficiency and
visual quality.</p>
<p>With robustness, the specifications all agree that image loads
return…</p>
<ul>
<li>Zero if the X- or Y-coordinate is out-of-bounds</li>
<li>Zero if the level is out-of-bounds</li>
</ul>
<p>Meanwhile, image loads on the M1 GPU return…</p>
<ul>
<li>Zero if the X- or Y-coordinate is out-of-bounds</li>
<li>Values from the last level if the level is out-of-bounds</li>
</ul>
<p>Uh-oh. Rather than returning zero for out-of-bounds levels, the
hardware clamps the level and returns nonzero values. It’s a mystery
why. The vendor does not document their hardware publicly, forcing us to
rely on reverse engineering to build drivers. Without documentation, we
don’t know if this behaviour is intentional or a hardware bug. Either
way, we need a workaround to pass conformance.</p>
<p>The obvious workaround is to never load from an invalid level:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode glsl"><code class="sourceCode glsl"><span id="cb6-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb6-1" tabindex="-1"></a><span class="kw">if</span> <span class="op">(</span>level <span class="op"><=</span> levels<span class="op">)</span> <span class="op">{</span></span>
<span id="cb6-2"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb6-2" tabindex="-1"></a> <span class="kw">return</span> <span class="bu">imageLoad</span><span class="op">(</span>x<span class="op">,</span> y<span class="op">,</span> level<span class="op">);</span></span>
<span id="cb6-3"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb6-3" tabindex="-1"></a><span class="op">}</span> <span class="kw">else</span> <span class="op">{</span></span>
<span id="cb6-4"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb6-4" tabindex="-1"></a> <span class="kw">return</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb6-5"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb6-5" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>That involves branching, which is inefficient. Loading an
out-of-bounds level doesn’t crash, so we can speculatively load and then
use a compare-and-select operation instead of branching:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode glsl"><code class="sourceCode glsl"><span id="cb7-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb7-1" tabindex="-1"></a><span class="dt">vec4</span> data <span class="op">=</span> <span class="bu">imageLoad</span><span class="op">(</span>x<span class="op">,</span> y<span class="op">,</span> level<span class="op">);</span></span>
<span id="cb7-2"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb7-2" tabindex="-1"></a></span>
<span id="cb7-3"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb7-3" tabindex="-1"></a><span class="kw">return</span> <span class="op">(</span>level <span class="op"><=</span> levels<span class="op">)</span> <span class="op">?</span> data <span class="op">:</span> <span class="dv">0</span><span class="op">;</span></span></code></pre></div>
<p>This workaround is okay, but it could be improved. While the M1 GPU
has combined compare-and-select instructions, the instruction set is
<em>scalar</em>. Each thread processes one value at a time, not a vector
of multiple values. However, image loads return a vector of four
components (red, green, blue, alpha). While the pseudo-code looks
efficient, the resulting assembly is not:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb8-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb8-1" tabindex="-1"></a>image_load R<span class="op">,</span> x<span class="op">,</span> y<span class="op">,</span> level</span>
<span id="cb8-2"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb8-2" tabindex="-1"></a>ulesel R<span class="op">[</span><span class="dv">0</span><span class="op">],</span> level<span class="op">,</span> levels<span class="op">,</span> R<span class="op">[</span><span class="dv">0</span><span class="op">],</span> <span class="dv">0</span></span>
<span id="cb8-3"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb8-3" tabindex="-1"></a>ulesel R<span class="op">[</span><span class="dv">1</span><span class="op">],</span> level<span class="op">,</span> levels<span class="op">,</span> R<span class="op">[</span><span class="dv">1</span><span class="op">],</span> <span class="dv">0</span></span>
<span id="cb8-4"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb8-4" tabindex="-1"></a>ulesel R<span class="op">[</span><span class="dv">2</span><span class="op">],</span> level<span class="op">,</span> levels<span class="op">,</span> R<span class="op">[</span><span class="dv">2</span><span class="op">],</span> <span class="dv">0</span></span>
<span id="cb8-5"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb8-5" tabindex="-1"></a>ulesel R<span class="op">[</span><span class="dv">3</span><span class="op">],</span> level<span class="op">,</span> levels<span class="op">,</span> R<span class="op">[</span><span class="dv">3</span><span class="op">],</span> <span class="dv">0</span></span></code></pre></div>
<p>Fortunately, the vendor driver has a trick. We know the hardware
returns zero if either X or Y is out-of-bounds, so we can <em>force</em>
a zero output by <em>setting</em> X or Y out-of-bounds. As the maximum
image size is 16384 pixels wide, any X greater than 16384 is
out-of-bounds. That justifies an alternate workaround:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode glsl"><code class="sourceCode glsl"><span id="cb9-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb9-1" tabindex="-1"></a><span class="dt">bool</span> valid <span class="op">=</span> <span class="op">(</span>level <span class="op"><=</span> levels<span class="op">);</span></span>
<span id="cb9-2"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb9-2" tabindex="-1"></a><span class="dt">int</span> x_ <span class="op">=</span> valid <span class="op">?</span> x <span class="op">:</span> <span class="dv">20000</span><span class="op">;</span></span>
<span id="cb9-3"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb9-3" tabindex="-1"></a></span>
<span id="cb9-4"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb9-4" tabindex="-1"></a><span class="kw">return</span> <span class="bu">imageLoad</span><span class="op">(</span>x_<span class="op">,</span> y<span class="op">,</span> level<span class="op">);</span></span></code></pre></div>
<p>Why is this better? We only change a single scalar, not a whole
vector, compiling to compact scalar assembly:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode asm"><code class="sourceCode fasm"><span id="cb10-1"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb10-1" tabindex="-1"></a>ulesel x_<span class="op">,</span> level<span class="op">,</span> levels<span class="op">,</span> x<span class="op">,</span> <span class="op">#</span><span class="dv">20000</span></span>
<span id="cb10-2"><a href="https://rosenzweig.io/blog/gpu-feed.xml#cb10-2" tabindex="-1"></a>image_load R<span class="op">,</span> x_<span class="op">,</span> y<span class="op">,</span> level</span></code></pre></div>
<p>If we preload the constant to a uniform register, the workaround is a
single instruction. That’s optimal – and it passes conformance.</p>
<hr />
<p><em>Blender <a href="https://download.blender.org/demo/eevee/wanderer/wanderer.blend">“Wanderer”</a>
demo by <a href="https://www.artstation.com/dbystedt">Daniel
Bystedt</a>, licensed CC BY-SA.</em></p>2024-02-14T05:00:00+00:00Lucas Fryzek: A Dive into Vulkanised 2024
https://fryzekconcepts.com/notes/vulkanised_2024.html
<figure>
<img alt="Vulkanised sign at google’s office" src="https://fryzekconcepts.com/assets/vulkanised_2024/vulkanized_logo_web.jpg" />
Vulkanised sign at google’s
office
</figure>
<p>Last week I had an exciting opportunity to attend the Vulkanised 2024
conference. For those of you not familar with the event, it is <a href="https://vulkan.org/events/vulkanised-2024">“The Premier Vulkan
Developer Conference”</a> hosted by the Vulkan working group from
Khronos. With the excitement out of the way, I decided to write about
some of the interesting information that came out of the conference.</p>
<h2 id="a-few-presentations">A Few Presentations</h2>
<p>My colleagues Iago, Stéphane, and Hyunjun each had the opportunity to
present on some of their work into the wider Vulkan ecosystem.</p>
<figure>
<img alt="Stéphane and Hyujun presenting" src="https://fryzekconcepts.com/assets/vulkanised_2024/vulkan_video_web.jpg" />
Stéphane and Hyujun
presenting
</figure>
<p>Stéphane & Hyunjun presented “Implementing a Vulkan Video Encoder
From Mesa to Streamer”. They jointly talked about the work they
performed to implement the Vulkan video extensions in Intel’s ANV Mesa
driver as well as in GStreamer. This was an interesting presentation
because you got to see how the new Vulkan video extensions affected both
driver developers implementing the extensions and application developers
making use of the extensions for real time video decoding and encoding.
<a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-stephane-cerveau-ko-igalia.pdf">Their
presentation is available on vulkan.org</a>.</p>
<figure>
<img alt="Iago presenting" src="https://fryzekconcepts.com/assets/vulkanised_2024/opensource_vulkan_web.jpg" />
Iago presenting
</figure>
<p>Later my colleague Iago presented jointly with Faith Ekstrand (a
well-known Linux graphic stack contributor from Collabora) on “8 Years
of Open Drivers, including the State of Vulkan in Mesa”. They both
talked about the current state of Vulkan in the open source driver
ecosystem, and some of the benefits open source drivers have been able
to take advantage of, like the common Vulkan runtime code and a shared
compiler stack. You can check out <a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/Vulkanised-2024-faith-ekstrand-collabora-Iago-toral-igalia.pdf">their
presentation for all the details</a>.</p>
<p>Besides Igalia’s presentations, there were several more which I found
interesting, with topics such as Vulkan developer tools, experiences of
using Vulkan in real work applications, and even how to teach Vulkan to
new developers. Here are some highlights for some of them.</p>
<h3 id="using-vulkan-synchronization-validation-effectively"><a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-john-zulauf-lunarg.pdf">Using
Vulkan Synchronization Validation Effectively</a></h3>
<p>John Zulauf had a presentation of the Vulkan synchronization
validation layers that he has been working on. If you are not familiar
with these, then you should really check them out. They work by tracking
how resources are used inside Vulkan and providing error messages with
some hints if you use a resource in a way where it is not synchronized
properly. It can’t catch every error, but it’s a great tool in the
toolbelt of Vulkan developers to make their lives easier when it comes
to debugging synchronization issues. As John said in the presentation,
synchronization in Vulkan is hard, and nearly every application he
tested the layers on reveled a synchronization issue, no matter how
simple it was. He can proudly say he is a vkQuake contributor now
because of these layers.</p>
<h3 id="years-of-teaching-vulkan-with-example-for-video-extensions"><a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-helmut-hlavacs.pdf">6
Years of Teaching Vulkan with Example for Video Extensions</a></h3>
<p>This was an interesting presentation from a professor at the
university of Vienna about his experience teaching graphics as well as
game development to students who may have little real programming
experience. He covered the techniques he uses to make learning easier as
well as resources that he uses. This would be a great presentation to
check out if you’re trying to teach Vulkan to others.</p>
<h3 id="vulkan-synchronization-made-easy"><a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-grigory-dzhavadyan.pdf">Vulkan
Synchronization Made Easy</a></h3>
<p>Another presentation focused on Vulkan sync, but instead of debugging
it, Grigory showed how his graphics library abstracts sync away from the
user without implementing a render graph. He presented an interesting
technique that is similar to how the sync validation layers work when it
comes ensuring that resources are always synchronized before use. If
you’re building your own engine in Vulkan, this is definitely something
worth checking out.</p>
<h3 id="vulkan-video-encode-api-a-deep-dive"><a href="https://vulkan.org/user/pages/09.events/vulkanised-2024/vulkanised-2024-tony-zlatinski-nvidia.pdf">Vulkan
Video Encode API: A Deep Dive</a></h3>
<p>Tony at Nvidia did a deep dive into the new Vulkan Video extensions,
explaining a bit about how video codecs work, and also including a
roadmap for future codec support in the video extensions. Especially
interesting for us was that he made a nice call-out to Igalia and our
work on Vulkan Video CTS and open source driver support on slide (6)
:)</p>
<h2 id="thoughts-on-vulkanised">Thoughts on Vulkanised</h2>
<p>Vulkanised is an interesting conference that gives you the
intersection of people working on Vulkan drivers, game developers using
Vulkan for their graphics backend, visual FX tool developers using
Vulkan-based tools in their pipeline, industrial application developers
using Vulkan for some embedded commercial systems, and general hobbyists
who are just interested in Vulkan. As an example of some of these
interesting audience members, I got to talk with a member of the Blender
foundation about his work on the Vulkan backend to Blender.</p>
<p>Lastly the event was held at Google’s offices in Sunnyvale. Which I’m
always happy to travel to, not just for the better weather (coming from
Canada), but also for the amazing restaurants and food that’s in the Bay
Area!</p>
<figure>
<img alt="Great bay area food" src="https://fryzekconcepts.com/assets/vulkanised_2024/food_web.jpg" />
Great bay area food
</figure>2024-02-14T05:00:00+00:00Bastien Nocera: New and old apps on Flathub
https://www.hadess.net/2024/02/new-and-old-apps-on-flathub.html
<p><b>3D Printing Slicers</b><br /></p><p> I recently replaced my <a href="https://www.flashforge.com/product-detail/flashforge-adventurer-3-3d-printer" target="_blank">Flashforge Adventurer 3</a> printer that I had been using for a few years as my first printer with a <a href="https://bambulab.com/en/x1">BambuLab X1 Carbon</a>, wanting a printer that was not a “project” so I could focus on modelling and printing. It's an investment, but my partner convinced me that I was using the printer often enough to warrant it, and told me to look out for Black Friday sales, which I did. </p><p>The hardware-specific slicer, <a href="https://github.com/bambulab/BambuStudio">Bambu Studio</a>, was available for Linux, but only as an AppImage, with many people reporting crashes on startup, non-working video live view, and other problems that the hardware maker tried to work-around by shipping separate AppImage variants for Ubuntu and Fedora.</p><p>After close to <a href="https://github.com/bambulab/BambuStudio/pulls?q=is%3Apr+author%3Ahadess+">150 patches to the upstream software</a> (which, in hindsight, I could probably have avoided by compiling the C++ code with LLVM), I manage to “flatpak” the application and <a href="https://flathub.org/apps/manage/com.bambulab.BambuStudio">make it available on Flathub</a>. It's reached 3k installs in about a month, which is quite a bit for a niche piece of software.</p><p>Note that if you click the “Donate” button <a href="https://flathub.org/apps/com.bambulab.BambuStudio">on the Flathub page</a>, it will take you a page where you can <strike>feed my transformed fossil fuel addiction</strike> buy filament for repairs and printing perfectly fitting everyday items, rather than bulk importing them from the other side of the planet.<br /></p><p style="text-align: center;"><img alt="Screenshot" class="yarl__slide_image" draggable="false" src="https://dl.flathub.org/repo/screenshots/com.bambulab.BambuStudio-stable/1248x702/com.bambulab.BambuStudio-7c7f64448845439b3a88d4f2b28dd225.png" /><br /> </p><p style="text-align: center;"><i>Preparing a <a href="https://www.printables.com/model/285296-mini-gg-consolized-game-gear-ggtv">Game Gear consoliser shell</a></i><br /></p><p></p><p>I will continue to maintain the <a href="https://flathub.org/apps/com.flashforge.FlashPrint">FlashPrint slicer</a> for FlashForge printers, installed by nearly 15k users, although I <a href="https://github.com/flathub/com.flashforge.FlashPrint/pull/50">enabled automated updates</a> now, and will not be updating the release notes, which required manual intervention.</p><p>FlashForge have unfortunately never answered my queries about making this distribution of their software official (and fixing the crash when using a VPN...).<br /></p><p><b> Rhythmbox</b> <br /></p><p>As I was updating the <a href="https://flathub.org/apps/org.gnome.Rhythmbox3">Rhythmbox Flatpak on Flathub</a>, I realised that it just reached 250k installs, which puts the number of installations of those 3D printing slicers above into perspective.</p><p style="text-align: center;"><img alt="rhythmbox-main-window.png" class="gl-max-w-full" height="276" src="https://gitlab.gnome.org/GNOME/rhythmbox/-/raw/master/data/screenshots/rhythmbox-main-window.png?ref_type=heads" width="400" /> </p><p style="text-align: center;"><i>The updated screenshot used on Flathub</i></p><p style="text-align: left;">Congratulations, and many thanks, to all the developers that keep on contributing to this very mature project, especially Jonathan Matthew who's been maintaining the app since 2008.<br /></p>2024-02-09T14:42:00+00:00Tomeu Vizoso: Etnaviv NPU update 16: A nice performance jump
https://blog.tomeuvizoso.net/2024/02/etnaviv-npu-update-16-nice-performance.html
<p>After the open-source driver for <a href="https://www.verisilicon.com/en/IPPortfolio/VivanteNPUIP">VeriSilicon's Vivante NPU</a> was <a href="https://blog.tomeuvizoso.net/2024/01/etnaviv-npu-update-15-we-are-upstream.html">merged into Mesa</a> two weeks ago, I have been taking some rest and thinking about what will come next.</p><h3 style="text-align: left;">Automated testing <br /></h3><p>I have a <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27214">merge request</a> to Mesa almost ready that will enable continuous integration testing on real hardware, but it depends on solving what seem to be problems with the power supplies of the boards in the HW testing lab. <a href="https://www.collabora.com/">Collabora</a> is graciously looking at it. Thanks!</p><h3 style="text-align: left;">Performance<br /></h3><p>I have been talking with quite a few people about the whole effort of bringing open-source to NPU hardware and something that came up more than once is the question of reaching or surpassing the performance level of the proprietary drivers.</p><p>It is a fair concern, because the systolic arrays will be underutilized if they starve of data. And given how fast they are in performing the arithmetic operations, and how slow memory buses and chips on embedded are (related to high-end GPUs, at least), this starving and the consequent underutilization are very likely to happen.<br /></p><p>IP vendors go to great lengths to prevent that from happening, inventing ways of getting the data faster to the processing elements, reducing the memory bandwidth used, and balancing the use of the different cores/arrays. There is plenty of published research on this area, which helps when figuring out how to make the most of a particular piece of hardware.<br /></p><h3 style="text-align: left;">Weight compression <br /></h3><p></p><p>Something I started working on last week is compression of zero values in the weight buffers. <a href="https://arxiv.org/abs/2102.00554">Sparsity</a> is very common in the neural models that this hardware is targeted to run, and common convolutions such as strided and depthwise can easily have zero ratios of 90% and more.</p><p>By compressing consecutive zeroes in a buffer we can greatly reduce pressure on the memory bus, keeping the processing units better fed (though I'm sure we are still far from getting good utilization).</p><p>By opportunistically using the 5 available bits to compress consecutive runs of zeroes, I was able to improve the performance of the MobileNetV1 model from 15.7 ms to 9.9 ms, and that of the SSDLite MobileDet model from 56.1 ms to 32.7 ms.</p><p></p><div class="separator" style="clear: both; text-align: center;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilf8m0CkxyFeQ7N-8XfsKx6dQjCdBxW1uJaOn2JrsAxAnNSZSLoiAlh-6Jw05edEoykz6U2PsuROOMOMi3-kGqpv-gqBiasERfcUnHOtGiWfQBQtDzhApd7lSU4gL83WkTW5Qzts32f8wPvg6DbZYeZNflL8HdDi9313PQJMR34D2r7Ku7fif2q9TpmLQ/s848/perf_evol.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="326" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilf8m0CkxyFeQ7N-8XfsKx6dQjCdBxW1uJaOn2JrsAxAnNSZSLoiAlh-6Jw05edEoykz6U2PsuROOMOMi3-kGqpv-gqBiasERfcUnHOtGiWfQBQtDzhApd7lSU4gL83WkTW5Qzts32f8wPvg6DbZYeZNflL8HdDi9313PQJMR34D2r7Ku7fif2q9TpmLQ/w640-h326/perf_evol.png" width="640" /></a></div><br /><br /></div><p></p><p>As shown in the graph above, we still have quite some room for improvement before we reach the performance of the proprietary driver, but we are getting close pretty fast. I also believe that we can tailor the driver to user's needs to surpass the performance of the proprietary driver for specific models, as this is open-source and everybody can chip in, see how things are made and improve them.</p><h3 style="text-align: left;">IRC channel</h3><p>I mentioned this in passing some time ago, but now that we have a driver at this level of usefulness, I think it is a good moment to remind that we have an IRC channel in the OFTC network to discuss anything about doing accelerated machine learning on the edge with upstream open-source software: #ml-mainline. You can click <a href="https://webchat.oftc.net/?channels=ml-mainline" target="_blank">here</a> to join via a web interface, though I recommend setting up an account at <a href="https://blog.christophersmart.com/2022/03/21/joining-a-bridged-irc-network-on-element-matrix/">matrix.org</a>.</p><h3 style="text-align: left;">What next</h3><p>Should I continue working on performance? Enable more models for new use cases? Enable this driver on more SoCs (i.MX8MP and S905D3 look interesting)? Start writing a driver for a completely different IP, such as Rockchip's or Amlogic's?</p><p>I still haven't decided, so if you have an opinion please drop a comment in this blog, or at any of the social networks linked from this blog.</p><p>I'm currently available for contracting, so I should be able to get on your project full-time on short notice.<br /></p>2024-02-08T09:36:00+00:00Nicolai Hähnle: Building a HIP environment from scratch
http://nhaehnle.blogspot.com/2024/02/building-hip-environment-from-scratch.html
<p>HIP is a C++-based, single-source programming language for writing GPU code. "Single-source" means that a single source file can contain both the "host code" which runs on the CPU and the "device code" which runs on the GPU. In a sense, HIP is "CUDA for AMD", except that HIP can actually target both AMD and Nvidia GPUs.</p><p>If you merely want to <i>use</i> HIP, your best bet is to look at <a href="https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html">the documentation</a> and download pre-built packages. (By the way, the documentation calls itself "ROCm" because that's what AMD calls its overall compute platform. It includes HIP, OpenCL, and more.)</p><p>I like to dig deep, though, so I decided I want to build at least the user space parts myself to the point where I can build a simple <a href="https://github.com/ROCm/HIP-Examples/tree/master/HIP-Examples-Applications/HelloWorld">HelloWorld</a> using a Clang from <a href="https://github.com/llvm/llvm-project">upstream LLVM</a>. It's all open-source, after all!</p><p>It's a bit tricky, though, in part because of the kind of bootstrapping problems you usually get when building toolchains: Running the compiler requires runtime libraries, at least by default, but building the runtime libraries requires a compiler. Luckily, it's not quite <i>that</i> difficult, though, because compiling the host libraries doesn't require a HIP-enabled compiler - any C++ compiler will do. And while the device libraries do require a HIP- (and OpenCL-)enabled compiler, it is possible to build code in a "freestanding" environment where runtime libraries aren't available.<br /></p><p>What follows is pretty much just a list of steps with running commentary on what the individual pieces do, since I didn't find an equivalent recipe in the official documentation. Of course, by the time you read this, it may well be outdated. Good luck!</p><p>Components need to be installed, but installing into some arbitrary prefix inside your <span style="font-family: courier;">$HOME</span> works just fine. Let's call it <span style="font-family: courier;">$HOME/prefix</span>. All packages use CMake and can be built using invocations along the lines of:</p><p><span style="font-family: courier;">cmake -S . -B build -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=$HOME/prefix -DCMAKE_PREFIX_PATH=$HOME/prefix<br />ninja -C build install</span></p><p><span style="font-family: inherit;">In some cases, additional variables need to be set.</span><br /></p><h3 style="text-align: left;">Step 1: clang and lld</h3><p>We're going to need a compiler and linker, so let's get <a href="https://github.com/llvm/llvm-project">llvm/llvm-project</a> and build it with Clang and LLD enabled: <span style="font-family: courier;">-DLLVM_ENABLE_PROJECTS='clang;lld' -DLLVM_TARGETS_TO_BUILD='X86;AMDGPU'</span></p><p>Building LLVM is an art of its own which is luckily <a href="https://llvm.org/docs/GettingStarted.html#local-llvm-configuration">reasonably well documented</a>, so I'm going to leave it at that.</p><h3 style="text-align: left;">Step 2: Those pesky cmake files</h3><p>Build and install <a href="https://github.com/ROCm/rocm-cmake">ROCm/rocm-cmake</a> to avoid cryptic error messages down the road when building other components that use those CMake files without documenting the dependency clearly. Not rocket science, but man am I glad for GitHub's search function.<br /></p><h3 style="text-align: left;">Step 3: libhsa-runtime64.so</h3><p>This is the lowest level user space host-side library in the ROCm stack. Its services, as far as I understand them, include setting up device queues and loading "code objects" (device ELF files). All communication with the kernel driver goes through here.</p><p>Notably though, this library does <i>not</i> know how to dispatch a kernel! In the ROCm world, the so-called Architected Queueing Language is used for that. An AQL queue is setup with the help of the kernel driver (and <i>that</i> does go through libhsa-runtime64.so), and then a small ring buffer and a "door bell" associated with the queue are mapped into the application's virtual memory space. When the application wants to dispatch a kernel, it (or rather, a higher-level library like libamdhip64.so that it links against) writes an AQL packet into the ring buffer and "rings the door bell", which basically just means writing a new ring buffer head pointer to the door bell's address. The door bell virtual memory page is mapped to the device, so ringing the door bell causes a PCIe transaction (for us peasants; <a href="https://www.amd.com/en/products/accelerators/instinct/mi300/mi300a.html">MI300A</a> has slightly different details under the hood) which wakes up the GPU.</p><p>Anyway, libhsa-runtime64.so comes in two parts for what I am being told are largely historical reasons:</p><ul style="text-align: left;"><li><a href="https://github.com/ROCm/ROCT-Thunk-Interface">ROCm/ROCT-Thunk-Interface</a></li><li><a href="https://github.com/ROCm/ROCR-Runtime">ROCm/ROCR-Runtime</a>; this one has one of those bootstrap issues and needs a <span style="font-family: courier;">-DIMAGE_SUPPORT=OFF</span></li></ul><p>The former is statically linked into the latter...</p><h3 style="text-align: left;">Step 4: It which must not be named</h3><p>For Reasons(tm), there is a fork of LLVM in the ROCm ecosystem, <a href="https://github.com/ROCm/llvm-project">ROCm/llvm-project</a>. Using upstream LLVM for the compiler seems to be fine and is what I as a compiler developer obviously want to do. However, this fork has an <a href="https://github.com/ROCm/llvm-project/tree/amd-staging/amd"><span style="font-family: courier;">amd</span></a> directory with a bunch of pieces that we'll need. I believe there is a desire to upstream them, but also an unfortunate hesitation from the LLVM community to accept something so AMD-specific.<br /></p><p>In any case, the required components can each be built individually against the upstream LLVM from step 1:<br /></p><ul style="text-align: left;"><li><a href="https://github.com/ROCm/llvm-project/tree/amd-staging/amd/hipcc">hipcc</a>; this is a frontend for Clang which is supposed to be user-friendly, but at the cost of adding an abstraction layer. I want to look at the details under the hood, so I don't want to and don't have to use it; but some of the later components want it</li><li><a href="https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs">device-libs</a>; as the name says, these are libraries of device code. I'm actually not quite sure what the intended abstraction boundary is between this one and the HIP libraries from the next step. I think these ones are meant to be tied more closely to the compiler so that other libraries, like the HIP library below, don't have to use <span style="font-family: courier;">__builtin_amdgcn_*</span> directly? Anyway, just keep on building...</li><li><a href="https://github.com/ROCm/llvm-project/tree/amd-staging/amd/comgr">comgr</a>; the "code object manager". Provides a stable interface to LLVM, Clang, and LLD services, up to (as far as I understand it) invoking Clang to compile kernels at runtime. But it seems to have no direct connection to the code-related services in libhsa-runtime64.so.<br /></li></ul><p>That last one is annoying. It needs a <span style="font-family: courier;">-DBUILD_TESTING=OFF</span></p><p>Worse, it has a fairly large interface with the C++ code of LLVM, which is famously not stable. In fact, at least during my little adventure, comgr wouldn't build as-is against the LLVM (and Clang and LLD) build that I got from step 1. I had to hack out a little bit of code in its symbolizer. I'm sure it's fine.</p><h3 style="text-align: left;">Step 5: libamdhip64.so</h3><p>Finally, here comes the library that implements the host-side HIP API. It also provides a bunch of HIP-specific device-side functionality, mostly by leaning on the device-libs from the previous step.</p><p>It lives in <a href="https://github.com/ROCm/clr">ROCm/clr</a>, which stands for either Compute Language Runtimes or Common Language Runtime. Who knows. Either one works for me. It's obviously for compute, and it's common because it also contains OpenCL support.<br /></p><p>You also need <a href="https://github.com/ROCm/HIP/">ROCm/HIP</a> at this point. I'm not quite sure why stuff is split up into so many repositories. Maybe ROCm/HIP is also used when targeting Nvidia GPUs with HIP, but ROCm/CLR isn't? Not a great justification in my opinion, but at least this <i>is</i> documented in the <a href="https://github.com/ROCm/clr/blob/develop/README.md">README</a>.<br /></p><p>CLR also needs a bunch of additional CMake options: <span style="font-family: courier;">-DCLR_BUILD_HIP=ON -DHIP_COMMON_DIR=${checkout of ROCm/HIP} -DHIPCC_BIN_DIR=$HOME/prefix/bin</span></p><h3 style="text-align: left;">Step 6: Compiling with Clang</h3><p>We can now build simple HIP programs with our own Clang against our own HIP and ROCm libraries:</p><p><span style="font-family: courier;">clang -x hip --offload-arch=gfx1100 --rocm-path=$HOME/prefix -rpath $HOME/prefix/lib -lstdc++ <a href="https://github.com/ROCm/HIP-Examples/blob/master/HIP-Examples-Applications/HelloWorld/HelloWorld.cpp">HelloWorld.cpp</a><br />LD_LIBRARY_PATH=$HOME/prefix/lib ./a.out</span></p><p>Neat, huh?<br /></p>2024-02-07T11:30:00+00:00Robert McQueen: Flathub: Pros and Cons of Direct Uploads
https://ramcq.net/2024/02/06/flathub-pros-and-cons-of-direct-uploads/
<p>I attended FOSDEM last weekend and had the pleasure to participate in the <a href="https://fosdem.org/2024/schedule/event/fosdem-2024-3715-flathub-flatpak-bof/">Flathub / Flatpak BOF</a> on Saturday. A lot of the session was used up by an extensive discussion about the merits (or not) of allowing direct uploads versus building everything centrally on Flathub’s infrastructure, and related concerns such as automated security/dependency scanning.</p>
<p>My original motivation behind the idea was essentially two things. The first was to offer a simpler way forward for applications that use language-specific build tools that resolve and retrieve their own dependencies from the internet. Flathub doesn’t allow network access during builds, and so a lot of manual work and additional tooling is currently needed (see <a href="https://docs.flatpak.org/en/latest/python.html">Python</a> and <a href="https://docs.flatpak.org/en/latest/electron.html">Electron</a> Flatpak guides). And the second was to offer a maybe more familiar flow to developers from other platforms who would just build something and then run another command to upload it to the store, without having to learn the syntax of a new build tool. There were many valid concerns raised in the room, and I think on reflection that this is still worth doing, but might not be as valuable a way forward for Flathub as I had initially hoped.</p>
<p>Of course, for a proprietary application where Flathub never sees the source or where it’s built, whether that binary is uploaded to us or downloaded by us doesn’t change much. But for an FLOSS application, a direct upload driven by the developer causes a regression on a number of fronts. We’re not getting too hung up on the “malicious developer inserts evil code in the binary” case because Flathub already works on the model of verifying the developer and the user makes a decision to trust that app – we don’t review the source after all. But we do lose other things such as our infrastructure building on multiple architectures, and visibility on whether the build environment or upload credentials have been compromised unbeknownst to the developer.</p>
<p>There is now a manual review process for when apps change their metadata such as name, icon, license and permissions – which would apply to any direct uploads as well. It was suggested that if only heavily sandboxed apps (eg no direct filesystem access without proper use of portals) were permitted to make direct uploads, the impact of such concerns might be somewhat mitigated by the sandboxing.</p>
<p>However, it was also pointed out that my go-to example of “Electron app developers can upload to Flathub with one command” was also a bit of a fiction. At present, none of them would pass that stricter sandboxing requirement. Almost all Electron apps run old versions of Chromium with less complete portal support, needing sandbox escapes to function correctly, and Electron (and Chromium’s) sandboxing still needs additional tooling/downstream patching to run inside a Flatpak. Buh-boh.</p>
<p>I think for established projects who already ship their own binaries from their own centralised/trusted infrastructure, and for developers who have understandable sensitivities about binary integrity such such as encryption, password or financial tools, it’s a definite improvement that we’re able to set up direct uploads with such projects with less manual work. There are already quite a few applications – including verified ones – where the build recipe simply fetches a binary built elsewhere and unpacks it, and if this already done centrally by the developer, repeating the exercise on Flathub’s server adds little value.</p>
<p>However for the individual developer experience, I think we need to zoom out a bit and think about how to improve this from a tools and infrastructure perspective as we grow Flathub, and as we seek to raise funds for different sources for these improvements. I took notes for everything that was mentioned as a tooling limitation during the BOF, along with a few ideas about how we could improve things, and hope to share these soon as part of an RFP/RFI (Request For Proposals/Request for Information) process. We don’t have funding yet but if we have some prospective collaborators to help refine the scope and estimate the cost/effort, we can use this to go and pursue funding opportunities.</p>2024-02-06T10:57:27+00:00Dave Airlie (blogspot): anv: vulkan av1 decode status
https://airlied.blogspot.com/2024/02/anv-vulkan-av1-decode-status.html
<p> Vulkan Video AV1 decode has been released, and I had some partly working support on Intel ANV driver previously, but I let it lapse.</p><p>The branch is currently [1]. It builds, but is totally untested, I'll get some time next week to plug in my DG2 and see if I can persuade it to decode some frames.</p><p>Update: the current branch decodes one frame properly, reference frames need more work unfortunately. <br /></p><p>[1] <a href="https://gitlab.freedesktop.org/airlied/mesa/-/commits/anv-vulkan-video-decode-av1">https://gitlab.freedesktop.org/airlied/mesa/-/commits/anv-vulkan-video-decode-av1</a><br /></p>2024-02-05T03:16:58+00:00Dave Airlie (blogspot): radv: vulkan av1 video decode status
https://airlied.blogspot.com/2024/02/radv-vulkan-av1-video-decode-status.html
<p>The Khronos Group announced VK_KHR_video_decode_av1 [1], this extension adds AV1 decoding to the Vulkan specification. There is a radv branch [2] and merge request [3]. I did some AV1 work on this in the past, but I need to take some time to see if it has made any progress since. I'll post an ANV update once I figure that out.</p><p>This extension is one of the ones I've been wanting for a long time, since having royalty-free codec is something I can actually care about and ship, as opposed to the painful ones. I started working on a MESA extension for this a year or so ago with Lynne from the ffmpeg project and we made great progress with it. We submitted that to Khronos and it has gone through the committee process and been refined and validated amongst the hardware vendors.</p><p>I'd like to say thanks to Charlie Turner and Igalia for taking over a lot of the porting to the Khronos extension and fixing up bugs that their CTS development brought up. This is a great feature of having open source drivers, it allows a lot quicker turn around time in bug fixes when devs can fix them themselves!<br /></p><p>[1]: <a href="https://www.khronos.org/blog/khronos-releases-vulkan-video-av1-decode-extension-vulkan-sdk-now-supports-h.264-h.265-encode">https://www.khronos.org/blog/khronos-releases-vulkan-video-av1-decode-extension-vulkan-sdk-now-supports-h.264-h.265-encode</a> </p><p>[2] <a href="https://gitlab.freedesktop.org/airlied/mesa/-/tree/radv-vulkan-video-decode-av1">https://gitlab.freedesktop.org/airlied/mesa/-/tree/radv-vulkan-video-decode-av1</a></p><p>[3] <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27424">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27424</a><br /></p>2024-02-02T02:27:20+00:00Bastien Nocera: Re: New responsibilities
https://www.hadess.net/2024/01/re-new-responsibilities.html
<p> A few months have passed since <a href="https://www.hadess.net/2023/08/new-responsibilities.html">New Responsibilities</a> was posted, so I thought I would provide an update.</p><p></p><p><b>Projects Maintenance</b></p><p>Of all the freedesktop projects I created and maintained, only one doesn't have a new maintainer, <a href="https://gitlab.freedesktop.org/hadess/low-memory-monitor">low-memory-monitor</a>.</p><p>This daemon is what the <a href="https://developer-old.gnome.org/gio/stable/GMemoryMonitor.html">GMemoryMonitor</a> GLib API is based on, so it can't be replaced trivially. <a href="https://gitlab.gnome.org/GNOME/glib/-/issues/2931">Efforts seem to be under way</a> to replace it with systemd APIs.</p><p>As for the other daemons:</p><ul style="text-align: left;"><li><a href="https://gitlab.freedesktop.org/hadess/switcheroo-control/">switcheroo-control</a> got picked up by Jonas Ådahl, one of the mutter maintainers. I'm looking forward to seeing <a href="https://gitlab.freedesktop.org/hadess/switcheroo-control/-/merge_requests/68">this merge request</a> fixed so we can have better menu items on dual-GPU systems</li><li><a href="https://gitlab.freedesktop.org/hadess/iio-sensor-proxy/">iio-sensor-proxy</a> added Dylan Van Assche to its maintenance team, assisting Guido Günther.</li><li><a href="https://gitlab.freedesktop.org/upower/power-profiles-daemon">power-profiles-daemon</a> is now maintained by Marco Trevisan. It recently got support for separate system and CPU power profiles, and <a href="https://gitlab.freedesktop.org/upower/power-profiles-daemon/-/merge_requests/137">display power saving features</a> are in the works.</li></ul><p>(As an aside, there's posturing towards <a href="https://discussion.fedoraproject.org/t/f40-change-proposal-tuned-replaces-power-profiles-daemon-self-contained/94995">replacing power-profiles-daemon with tuned in Fedora</a>. I would advise stakeholders to figure out whether having a large Python script in the boot hot path is a good idea, taking a look at bootcharts, and then thinking about whether hardware manufacturers would be able to help with supporting a tool with so many moving parts. Useful for tinkering, not for shipping in a product)</p><p><b>Updated responsibilities</b></p><p>Since mid-August, I've joined the Platform Enablement Team. Right now, I'm helping out with maintenance of the Bluetooth kernel stack in RHEL (and thus CentOS).</p><p></p><p></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ_PzvmL81pWXLelkdOQeYRfBo6-HgQN40mcxpLdqgnE6Ey_kutFN2FB0YqNpo7QvgAzAi8AjW39EltSAW3SfAykWcwZL7BemTbS_P4B0n6afZCKQpThl7prv8BY0dWGi14UdRpwW30izWTeOIKPVgulXCvF_8NXoqsCc1pE7dIKVbD4xPk6fV8e7MPjzqFaHmc_jdyw/s761/only-throw-bluetooth.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQ_PzvmL81pWXLelkdOQeYRfBo6-HgQN40mcxpLdqgnE6Ey_kutFN2FB0YqNpo7QvgAzAi8AjW39EltSAW3SfAykWcwZL7BemTbS_P4B0n6afZCKQpThl7prv8BY0dWGi14UdRpwW30izWTeOIKPVgulXCvF_8NXoqsCc1pE7dIKVbD4xPk6fV8e7MPjzqFaHmc_jdyw/s16000/only-throw-bluetooth.png" /></a></div>The goal is to eventually pivot to hardware enablement, which is likely to involve backporting and testing, more so than upstream enablement. This is currently dependent on attending some formal kernel development (and debugging) training sessions which should make it easier to see where my hodge-podge kernel knowledge stands.<p></p><p><b>Blog backlog</b></p><p>Before being moved to a different project, and apart from the usual and very time-consuming bug triage, user support and project maintenance, I also worked on a few new features. I have a few posts planned that will lay that out.<br /></p>2024-01-31T11:33:00+00:00Peter Hutterer: New gitlab.freedesktop.org 🚯 emoji-based spamfighting abilities
http://who-t.blogspot.com/2024/01/new-gitlabfreedesktoporg-emoji-based.html
<p>
This is a follow-up from <a href="https://who-t.blogspot.com/2023/03/new-gitlabfreedesktoporg-spamfighting.html">our Spam-label approach</a>, but this time with MOAR EMOJIS because that's what the world is turning into.
</p>
<p>
Since March 2023 projects could apply the "Spam" label on any new issue and have a magic bot come in and purge the user account plus all issues they've filed, see the <a href="https://who-t.blogspot.com/2023/03/new-gitlabfreedesktoporg-spamfighting.html">earlier post</a> for details. This works quite well and gives every project member the ability to quickly purge spam. Alas, pesky spammers are using other approaches to trick google into indexing their pork [1] (because at this point I think all this crap is just SEO spam anyway). Such as commenting on issues and merge requests. We can't apply labels to comments, so we found a way to work around that: emojis!
</p>
<p>
In GitLab you can add "reactions" to issue/merge request/snippet comments and in recent GitLab versions you can register for a <a href="https://docs.gitlab.com/ee/user/project/integrations/webhook_events.html#emoji-events" target="_blank">webhook</a> to be notified when that happens. So what we've added to the gitlab.freedesktop.org instance is support for the <b>:do_not_litter:</b> (🚯) emoji [2] - if you set that on an comment the author of said comment will be blocked and the comment content will be removed. After some safety checks of course, so you can't just go around blocking everyone by shotgunning emojis into gitlab. Unlike the "Spam" label this does not currently work recursively so it's best to report the user so admins can purge them properly - ideally <i>before</i> setting the emoji so the abuse report contains the actual spam comment instead of the redacted one. Also note that there is a 30 second grace period to quickly undo the emoji if you happen to set it accidentally.
</p>
<p>
Note that for purging issues, the "Spam" label is still required, the emojis only work for comments.
</p>
<p>
Happy cleanup!
</p>
<p>
<small>
[1] or pork-ish <br />
[2] Benjamin wanted to use <i>:poop:</i> but there's a chance that may get used for expressing disagreement with the comment in question <br />
</small>
</p>2024-01-29T07:58:25+00:00Hans de Goede: A fully open source stack for MIPI cameras
https://hansdegoede.livejournal.com/27909.html
Many recent Intel laptops have replaced the standard UVC USB camera module with a raw MIPI camera-sensor connected to the IPU6 found in recent Intel laptop chips.<br /><br />Both the hw interface of the ISP part of the IPU6 as well as the image processing algorithms used are considered a trade secret and so far the only Linux support for the IPU6 relies on an out of tree kernel driver with a proprietary userspace stack on top, which is currently available in <a href="https://hansdegoede.dreamwidth.org/27235.html" rel="nofollow" target="_blank">rpmfusion</a>.<br /><br />Both Linaro and Red Hat have identified the missing ISP support for various ARM and X86 chips as a problem. Linaro has started a project to add a SoftwareISP component to libcamera to allow these cameras to work without needing proprietary software and Red Hat has joined Linaro in working on this.<br /><br /><span style="font-size: x-large;">FOSDEM talk</span><br /><br />Bryan O'Donoghue (Linaro) and I are giving <a href="https://fosdem.org/2024/schedule/event/fosdem-2024-3013-a-fully-open-source-stack-for-mipi-cameras/" rel="nofollow" target="_blank">a talk about this at FOSDEM</a>.<br /><br /><span style="font-size: x-large;">Fedora COPR repository</span><br /><br />This work is at a point now where it is ready for wider testing. <a href="https://copr.fedorainfracloud.org/coprs/jwrdegoede/ipu6-softisp/" rel="nofollow" target="_blank">A Fedora COPR repository</a> with a patched kernel and libcamera is now available for users to test, see <a href="https://copr.fedorainfracloud.org/coprs/jwrdegoede/ipu6-softisp/" rel="nofollow" target="_blank">the COPR page</a> for install and test instructions.<br /><br />This has been tested on the following devices:<ul><li>Lenovo ThinkPad X1 yoga gen 8 (should work on any ThinkPad with ov2740 sensor)</li><li>Dell Latitude 9420 (ov01a1s sensor)</li><li>HP Spectre x360 13.5 (2023 model, hi556 sensor)</li></ul><br /><span style="font-size: x-large;">Description of the stack</span><br /><ol><li>Kernel driver for the camera sensor, for the ov2740 used on current Lenovo designs (excluding MTL) I have landed all necessary kernel changes for this upstream.</li><li>Kernel support for the CSI receiver part of the IPU6 Intel is working on upstreaming this and has recently posted <a href="https://lore.kernel.org/linux-media/20240111065531.2418836-1-bingbu.cao@intel.com/" rel="nofollow" target="_blank">v3 of their patch series</a> for this upstream and this is under active review.</li><li>A FOSS Software ISP stack inside libcamera to replace the missing IPU6 ISP (processing-system/psys) support. Work on this is under way. I've recently send out <a href="https://lists.libcamera.org/pipermail/libcamera-devel/2024-January/040113.html" rel="nofollow" target="_blank">v2 of the patch-series for this</a>.</li><li>Firefox pipewire camera support and support for the camera portal to get permission to access the camera. My colleague Jan Grulich has been working on this, see <a href="https://jgrulich.cz/2023/11/24/pipewire-camera-support-in-firefox-2/" rel="nofollow" target="_blank">Jan's blogpost</a>. Jan's work has landed in the just released Firefox 122.</li></ol><br />2024-01-26T16:48:18+00:00Tomeu Vizoso: Etnaviv NPU update 15: We are upstream!
https://blog.tomeuvizoso.net/2024/01/etnaviv-npu-update-15-we-are-upstream.html
<p>Today the <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25714">initial merge request for Teflon</a> was merged into Mesa, along with the first hardware driver, for <a href="https://www.verisilicon.com/en/IPPortfolio/VivanteNPUIP">VeriSilicon's Vivante NPU</a>.</p><p>For those who don't know, <a href="https://docs.mesa3d.org/teflon.html">Teflon</a> is a <a href="https://www.tensorflow.org/lite/performance/delegates">TensorFlow Lite delegate</a> that aims to support several <a href="https://en.wikipedia.org/wiki/AI_accelerator">AI accelerators</a> (also called NPUs, TPUs, APUs, NNAs, etc). Teflon is and will always be open-source, and is released under the <a href="https://en.wikipedia.org/wiki/MIT_License">MIT license</a>.<br /></p><p style="text-align: center;"><a href="https://gitlab.freedesktop.org/uploads/-/system/group/avatar/1155/gears.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="200" src="https://gitlab.freedesktop.org/uploads/-/system/group/avatar/1155/gears.png" width="200" /></a> <br /></p><p>This will have the following advantages for the project:</p><ol style="text-align: left;"><li>The userspace driver will be automatically packaged by distros such as Debian, Ubuntu, Fedora and Yocto, when they update to the next stable version: 24.1.0, which should be out around May 2024. See the <a href="https://docs.mesa3d.org/release-calendar.html">release calendar</a>.<br /></li><li>Contribution to the project will happen within the <a href="https://docs.mesa3d.org/submittingpatches.html">development process of Mesa</a>. This is a well-established process in which employees from companies such as Google, Valve, <a href="https://docs.mesa3d.org/drivers/powervr.html">Imagination</a>, Intel, <a href="https://docs.mesa3d.org/drivers/d3d12.html">Microsoft</a> and <a href="https://docs.mesa3d.org/drivers/radv.html">AMD</a> work together on their GPU drivers.<br /></li><li>The project has great technical infrastructure, maintained by awesome sysadmins:</li><ul><li>A well-maintained <a href="https://gitlab.freedesktop.org/">Gitlab instance</a>,</li><li><a href="https://docs.mesa3d.org/ci/index.html">extensive CI</a>, for both build and runtime testing, on real hardware,</li><li>mailing list, web server, etc.<br /></li></ul><li>More importantly, the Mesa codebase has also infrastructure that will be very useful to NPU drivers:</li><ul><li>The <a href="https://docs.mesa3d.org/nir/index.html">NIR intermediate representation</a> with loads of lowering passes. This will be immediately useful for lowering operations in models to programmable cores, but in the future I want to explore representing whole models with this, for easier manipulation and lowerings.</li><li>The <a href="https://docs.mesa3d.org/gallium/index.html">Gallium internal API</a> that decouples HW-specific frontends from HW-specific drivers. This will be critical as we add support for more NPUs, and also when we expose to other frameworks such as <a href="https://developer.android.com/ndk/guides/neuralnetworks">Android NNAPI</a>.</li></ul><li>And lastly, Mesa is part of a great yearly conference that allows contributors to discuss their work with others in a high-bandwidth environment: <a href="https://www.x.org/wiki/Events/">XDC</a>.<br /></li></ol><div><h3 style="text-align: left;">The story so far</h3><p style="text-align: left;">In 2022, while still at <a href="http://collabora.com/">Collabora</a>, I started adding OpenCL support to the <a href="https://github.com/etnaviv/etna_viv#introduction">Etnaviv</a> driver in Mesa. Etnaviv is a userspace and kernel driver for <a href="https://www.verisilicon.com/en/IPPortfolio/VivanteNPUIP">VeriSilicon's Vivante NPUs</a>.</p><p style="text-align: left;">The goal was to accelerate machine learning workloads, but once I left Collabora to focus on the project and had implemented enough of the OpenCL specification to run a popular object classification model, I realized that there was no way I was going to ever get close to the performance of the proprietary driver by using the programmable part fo the NPU.</p><p style="text-align: left;">I dug a bit deeper in how the proprietary driver was doing its thing and realized that almost all operations weren't running as shaders, but on "fixed-function" hardware units (<a href="https://en.wikipedia.org/wiki/Systolic_array">systolic arrays</a>, as I realized later).</p><p style="text-align: left;">Fortunately, all these accelerators that support matrix multiplications as individual instructions are very similar in their fundamentals, and the state of the art has been well documented in scientific publications since <a href="https://arxiv.org/abs/1704.04760">Google released their first TPU</a>.</p><p style="text-align: left;">With all this wealth of information and with the help of VeriSilicon's own debugging output and open-source kernel driver, I had a very good start at reverse engineering the hardware. The rest was done by observing how the proprietary userspace driver interacted with the kernel, with the help of existing tools from the Etnaviv projects and others that I wrote, and by staring for long hours to all the produced data in spreadsheets.<br /></p><p style="text-align: left;">During the summer and with <a href="https://libre.computer/">Libre Computer</a>'s sponsorship, I chipped away at documenting the interface to the convolution units and implementing support for them in my Mesa branch.</p><p style="text-align: left;">By <a href="https://blog.tomeuvizoso.net/2023/10/etnaviv-npu-update-9-we-got-there.html">autumn</a> I was able to run that same object classification model (<a href="https://arxiv.org/abs/1704.04861">MobileNet V1</a>) 3 times faster than the CPU was able to. A <a href="https://blog.tomeuvizoso.net/2023/11/etnaviv-npu-update-11-now-twice-as-fast.html">month later</a> I learned to use the other systolic array in the NPU, for tensor manipulation operations, and got it running 6 times faster than the CPU and only twice as slow as the proprietary driver.</p><p style="text-align: left;">Afterwards I got to work on object detection models, and by the <a href="https://blog.tomeuvizoso.net/2024/01/etnaviv-npu-update-14-object-detection.html">start of 2024</a> I managed to run <a href="https://arxiv.org/abs/2004.14525">SSDLite MobileDet</a> at 56 milliseconds per inference, which is around 3 times slower than what the proprietary achieves, but still pretty darn useful in many situations!</p><p style="text-align: left;">The rest of the time until now has been spent polishing the driver, improving its test suite and reacting to code reviews from the Mesa community.<br /></p><h3 style="text-align: left;">Next steps</h3><p style="text-align: left;">Now that the codebase is part of upstream Mesa, my work will progress in smaller batches, and I expect myself to be spending time reviewing other people's contributions and steering the project. People want to get this running on other variants of the VeriSilicon NPU IP and I am certainly not going to be able to do it all!</p><p style="text-align: left;">I also know of people wanting to put this together with other components in demos and solutions, so I will be supporting them so we can showcase the usefulness of all this.</p><p style="text-align: left;">There are some other use cases that this hardware is well-suited for, such as more advanced image classification, pose estimation, audio classification, depth estimation, and image segmentation. I will be looking at what the most useful models require in terms of operations and implementing them.</p><p style="text-align: left;">There is quite some low hanging fruit for improving performance, so I expect myself to be implementing support for zero-compression, more advanced tiling, better use of the SRAM in the device, and a few others.</p><p style="text-align: left;">And at some point I should start looking at other NPU IP to add support to. The ones I'm currently leading the most towards are RockChip's own IP, Mediatek's, Cadence's and Amlogic's.<br /></p><h3 style="text-align: left;">Thanks</h3><p>One doesn't just start writing an NPU driver by itself, and even more without any documentation, so I need to thank the following people who have helped me greatly in this effort:</p><p><a href="http://collabora.com/">Collabora</a> for allowing me to start playing with this while I still worked with them.</p><p><a href="https://libre.computer/">Libre Computer</a> and specifically Da Xue for supporting me financially for most of 2023. They are a very small company, so I really appreciate that they believed in the project and put aside some money so I could focus on it.</p><p><a href="https://www.igalia.com/">Igalia</a> for letting <a href="https://christian-gmeiner.info/">Christian Gmeiner</a> spend time reviewing all my code and answering my questions about Etnaviv. <br /></p><p></p><p style="text-align: left;"><a href="https://embedded-recipes.org/">Embedded Recipes</a> for giving me the opportunity to present my work last autumn in Paris.</p></div><div><p style="text-align: left;">Lucas Stach from <a href="https://www.pengutronix.de/en/index.html">Pengutronix</a> for answering my questions and listening to my problems when I suspected of something in the Etnaviv kernel driver.</p><p style="text-align: left;">Neil Armstrong from <a href="https://www.linaro.org/">Linaro</a> for supporting me in the hardware enablement of the NPU driver on the Amlogic SoCs.</p><p style="text-align: left;">And a collective thanks to the DRI/Mesa community for being so awesome!<br /></p><p></p></div>2024-01-24T10:52:00+00:00Samuel Iglesias: XDC 2023: Behind the curtains
https://blogs.igalia.com/siglesias/2024/01/22/XDC-2023-Behind-the-curtains/
<p>Time flies! Back in October, <a href="https://www.igalia.com">Igalia</a> organized <a href="https://xdc2023.x.org">X.Org Developers Conference 2023</a> in A Coruña, Spain.</p>
<p>In case you don’t know it, X.Org Developers Conference, despite the X.Org in the name, is a conference for all developers working in the open-source graphics stack: anything related to DRM/KMS, Mesa, X11 and Wayland compositors, etc.</p>
<p><img alt="A Coruña's Orzán beach" class="center-block" src="https://blogs.igalia.com/siglesias/assets/orzan-corunha-low.jpg" /></p>
<p>This year, I participated in the organization of XDC in A Coruña, Spain (<a href="https://blogs.igalia.com/siglesias/2018/10/03/xdc-2018-experience/">again!</a>) by taking care of different aspects: from logistics in the venue (<a href="https://www.palexco.com">Palexco</a>) to running it in person. It was a very tiring but fulfilling experience.</p>
<h2 id="sponsors">Sponsors</h2>
<p>First of all, I would like to thank all the sponsors for their support, as without them, this conference wouldn’t happen:</p>
<p><img alt="XDC 2023 sponsors" class="center-block" src="https://blogs.igalia.com/siglesias/assets/xdc2023_sponsors.png" /></p>
<p>They didn’t only give economic support to the conference: <a href="https://www.igalia.com">Igalia</a> sponsored the welcome event and lunches; <a href="https://www.x.org">X.Org Foundation</a> sponsored coffee breaks; <a href="https://www.visitcoruna.com/">Tourism Office of A Coruña</a> sponsored the guided tour in the city center; and <a href="https://www.raspberrypi.com/">Raspberry Pi</a> sent <a href="https://www.raspberrypi.com/products/raspberry-pi-5/">Raspberry Pi 5 boards</a> to all speakers!</p>
<h2 id="xdc-2023-stats">XDC 2023 Stats</h2>
<p>XDC 2023 was a success on attendance and talks submissions. Here you have some stats:</p>
<ul>
<li>📈 160 registered attendees.</li>
<li>👬 120 attendees picked their badge in person.</li>
<li>💻 25 attendees registered as virtual.</li>
<li>📺 More than 6,000 views on live stream.</li>
<li>📝 55 talks/workshops/demos distributed in three days of conference..</li>
<li>🧗♀️ There were 3 social events: welcome event, city center guide tour, and one unofficial climbing activity!</li>
</ul>
<p><img alt="XDC 2023 welcome event" class="center-block" src="https://blogs.igalia.com/siglesias/assets/xdc2023_welcome_event.jpg" /></p>
<p>Was XDC 2023 perfect organization-wise? Of course… no! Like in any event, we had some issues here and there: one with the Wi-Fi network that was quickly detected and fixed; some issues with the meals and coffee breaks (food allergies mainly), we lost some seconds of audio of a talk in the on-live streaming, and other minor things. Not bad for a community-run event!</p>
<p>Nevertheless, I would like to thank all the staff at Palexco for their quick response and their understanding.</p>
<h2 id="talk-recordings--slides">Talk recordings & slides</h2>
<p><img alt="XDC 2023 talk by André Almeida" class="center-block" src="https://blogs.igalia.com/siglesias/assets/xdc2023_talk_andre.jpg" /></p>
<p>Want to see again some talks? All conference recordings were uploaded to <a href="https://www.youtube.com/watch?v=ouc61Ompc4E&list=PLe6I3NKr-I4K7tiw3KffqT8Gje8ZY7tyP">X.Org Foundation Youtube channel</a>.</p>
<p>Slides are available to download in <a href="https://indico.freedesktop.org/event/4/timetable/#all.detailed">each talk description</a>.</p>
<p>Enjoy!</p>
<h2 id="xdc-2024">XDC 2024</h2>
<p><img alt="XDC 2024 will be in North America" class="center-block" src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/North_America_map.svg/231px-North_America_map.svg.png" /></p>
<p>We cannot tell yet where is going to happen XDC 2024, other than it will be in North America… but I can tell you that this will be announced soon. Stay tuned!</p>
<h2 id="want-to-organize-xdc-2025-or-xdc-2026">Want to organize XDC 2025 or XDC 2026?</h2>
<p>If we continue with the current cadence: 2025 would be again in Europe, and 2026 event would be in North America.</p>
<p>There is a list of requirements <a href="https://www.x.org/wiki/Events/RFP/">here</a>. Nevertheless, feel free to <a href="mailto:siglesiasATigaliaDOTcom">contact me</a>, or to the <a href="mailto:boardATfoundation.x.org">X.Org Board of Directors</a>, in order to get first-hand experience and knowledge about what organizing XDC entails.</p>
<p><img alt="XDC 2023 audience" class="center-block" src="https://blogs.igalia.com/siglesias/assets/xdc2023_audience.jpg" /></p>
<h2 id="thanks">Thanks</h2>
<p>Thanks to all volunteers, collaborators, Palexco staff, <a href="https://www.gpul.org">GPUL</a>, X.Org Foundation and many other people for their hard work. Special thanks to my Igalia colleague <a href="https://www.igalia.com/team/chema">Chema</a>, who did an outstanding job organizing the event together with me.</p>
<p>Thanks for the <a href="https://indico.freedesktop.org/event/4/page/24-sponsors">sponsors</a> for their extraordinary support to this conference.</p>
<p>Thanks to <a href="https://www.igalia.com">Igalia</a> not only for sponsoring the event, but also for all the support I got during the past year. I am glad to be part of this company, and I am always surprised by how great my colleagues are.</p>
<p>And last, but not least, thanks to all speakers and attendees. Without you, the conference won’t exist.</p>
<p>See you at XDC 2024!</p>2024-01-22T09:06:00+00:00Simon Ser: Status update, January 2024
https://emersion.fr/blog/2024/status-update-60/
<p>Hi! This month has been pretty hectic due to the <a href="https://sourcehut.org/blog/2024-01-19-outage-post-mortem/">SourceHut network outage</a>.
We’ve all in the staff team invested a lot of time and energy to minimize the
downtime as much as possible. Thankfully things have settled down now, there
are still a lot of follow-up tasks to complete but with less urgency. I’m
really grateful for the community’s reaction, everybody had been very
understanding and supportive. Thank you!</p>
<p>In other SourceHut news, I’ve been working on <a href="https://sr.ht/~emersion/yojo/">yojo</a>, a bridge which provides
CI for Codeberg projects via builds.sr.ht. I’ve added support for pull
requests, taught yojo to handle multiple manifests, added logic to
automatically refresh access tokens before they expire, and fixed a bunch of
bugs.</p>
<p>The <abbr title="New Project of the Month">NPotM</abbr> is
<a href="https://sr.ht/~emersion/sr.ht-container-compose/">sr.ht-container-compose</a>, a <a href="https://docs.docker.com/compose/">docker-compose</a> configuration for SourceHut. It
provides an easy way to spin up a SourceHut development environment without
having to set up each service and its dependencies individually. I hope this
project can reduce friction for new SourceHut contributors. There are many
services missing, patches welcome!</p>
<p>This month, we’ve finally merged the <a href="https://github.com/swaywm/sway/pull/6844">Sway pull request</a> to use the
wlroots scene-graph API! This is exciting because it fixes a whole class of
bugs, it removes a lot of manual hand-rolled logic in Sway (e.g. rendering,
damage tracking, input event routing, direct scan-out, some of the protocol
support…), it provides nice performance optimizations via culling (e.g. the
background image is no longer painted if a web browser is covering it), and it
unlocks upcoming performance optimizations (e.g. KMS plane offloading). Many
thanks to Alexander for writing the patches and maintaining them for over a
year, and to Kirill for pushing it over the finish line!</p>
<p>On the wlroots side, my work on <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4480"><code>wlr_surface_synced</code></a> has been
merged, allowing us to latch surface commits until an arbitrary condition is
met. This work is necessary for the upcoming explicit synchronization protocol,
as well as the work-in-progress transactions protocol and avoiding compositor
freezes when a client is very slow to render. We’ve released wlroots 0.17.1,
with a collection of bugfixes backported by Simon Zeni. Last, we’ve dropped
support for the legacy <code>wl_drm</code> protocol by default, and this caused a bit of
breakage here and there (<a href="https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1236">xserver</a>, <a href="https://github.com/intel/libva/pull/790">libva</a>, <a href="https://github.com/GPUOpen-Drivers/AMDVLK/issues/351">amdvlk</a>). We do really want to
phase out <code>wl_drm</code> though, so we’ve decided to stick with that removal.</p>
<p>This month’s collection of miscellaneous project updates includes
<a href="https://github.com/emersion/go-imap/releases/tag/v2.0.0-alpha.8">go-imap v2 alpha 8</a> with separate types for sequence numbers and UIDs, which
was a lot of work to get right but I think was worth it. I’ve also released
<a href="https://github.com/emersion/go-maildir/releases/tag/v0.4.0">go-maildir v0.4.0</a> with a new <code>Walk</code> function (to iterate over messages without
allocating a list) and numerous fixes. I’ve sent a <a href="https://gitlab.com/gitlab-org/cli/-/merge_requests/1395">GitLab cli patch</a>
to fix invalid release asset links for third-party GitLab instances, and a
<a href="https://github.com/mesonbuild/meson/pull/12718">Meson patch</a> to add C23 support.</p>
<p>See you next month!</p>2024-01-20T22:00:00+00:00Matthias Klumpp: Wayland really breaks things… Just for now?
https://blog.tenstral.net/2024/01/wayland-really-breaks-things-just-for-now.html
<p>This post is in part a response to an aspect of Nate’s post “<a href="https://pointieststick.com/2023/12/26/does-wayland-really-break-everything/">Does Wayland really break everything?</a>“, but also my reflection on discussing Wayland protocol additions, a unique pleasure that I have been involved with for the past months<sup class="fn"><a href="https://blog.tenstral.net/category/planet/fdo/feed#ddee6e08-d4f7-4154-a7e3-4d67d99399da" id="ddee6e08-d4f7-4154-a7e3-4d67d99399da-link">1</a></sup>.</p>
<h3 class="wp-block-heading">Some facts</h3>
<p>Before I start I want to make a few things clear: The Linux desktop will be moving to Wayland<sup class="fn"><a href="https://blog.tenstral.net/category/planet/fdo/feed#9f1f8a66-e687-4060-b58b-00d1f11bcf16" id="9f1f8a66-e687-4060-b58b-00d1f11bcf16-link">2</a></sup> – this is a fact at this point (and has been for a while), sticking to X11 makes no sense for future projects. From reading Wayland protocols and working with it at a much lower level than I ever wanted to, it is also very clear to me that Wayland is an exceptionally well-designed core protocol, and so are the additional extension protocols (xdg-shell & Co.). The modularity of Wayland is great, it gives it incredible flexibility and will for sure turn out to be good for the long-term viability of this project (and also provides a path to correct protocol issues in future, if one is found). In other words: Wayland is an amazing foundation to build on, and a lot of its design decisions make a lot of sense!</p>
<p>The shift towards people seeing “Linux” more as an application developer platform, and taking PipeWire and XDG Portals into account when designing for Wayland is also an amazing development and I love to see this – this holistic approach is something I always wanted!</p>
<p>Furthermore, I think Wayland <em>removes</em> a lot of functionality that <em>shouldn’t exist</em> in a modern compositor – and that’s a good thing too! Some of X11’s features and design decisions had clear drawbacks that we shouldn’t replicate. I highly recommend to read Nate’s blog post, it’s very good and goes into more detail. And due to all of this, I firmly believe that any advancement in the Wayland space must come from within the project.</p>
<h3 class="wp-block-heading">But!</h3>
<p>But! Of course there was a “but” coming <img alt="😉" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f609.png" style="height: 1em;" /> – I think while developing Wayland-as-an-ecosystem we are now entrenched into narrow concepts of how a desktop should work. While discussing Wayland protocol additions, a lot of concepts clash, people from different desktops with different design philosophies debate the merits of those over and over again never reaching any conclusion (just as you will never get an answer out of humans whether sushi or pizza is the clearly superior food, or whether CSD or SSD is better). Some people want to use Wayland as a vehicle to force applications to submit to their desktop’s design philosophies, others prefer the smallest and leanest protocol possible, other developers want the most elegant behavior possible. To be clear, I think those are all very valid approaches.</p>
<p>But this also creates problems: By switching to Wayland compositors, we are already forcing a lot of porting work onto toolkit developers and application developers. This is annoying, but just work that has to be done. It becomes frustrating though if Wayland provides toolkits with absolutely no way to reach their goal in any reasonable way. For Nate’s Photoshop analogy: Of course Linux does not break Photoshop, it is Adobe’s responsibility to port it. But what if Linux was missing a crucial syscall that Photoshop needed for proper functionality and Adobe couldn’t port it without that? In that case it becomes much less clear on who is to blame for Photoshop not being available.</p>
<p>A lot of Wayland protocol work is focused on the environment and design, while applications and work to port them often is considered less. I think this happens because the overlap between application developers and developers of the desktop environments is not necessarily large, and the overlap with people willing to engage with Wayland upstream is even smaller. The combination of Windows developers porting apps to Linux <em>and</em> having involvement with toolkits or Wayland is pretty much nonexistent. So they have less of a voice.</p>
<h3 class="wp-block-heading">A quick detour through the neuroscience research lab</h3>
<p>I have been involved with Freedesktop, GNOME and KDE for an incredibly long time now (more than a decade), but my actual job (besides consulting for Purism) is that of a PhD candidate in a neuroscience research lab (working on the morphology of biological neurons and its relation to behavior). I am mostly involved with three research groups in our institute, which is about 35 people. Most of us do all our data analysis on powerful servers which we connect to using RDP (with KDE Plasma as desktop). Since I joined, I have been pushing the envelope a bit to extend Linux usage to data acquisition and regular clients, and to have our data acquisition hardware interface well with it. Linux brings some unique advantages for use in research, besides the obvious one of having every step of your data management platform introspectable with no black boxes left, a goal I value very highly in research (but this would be its own blogpost).</p>
<p>In terms of operating system usage though, most systems are still Windows-based. Windows is what companies develop for, and what people use by default and are familiar with. The choice of operating system is very strongly driven by application availability, and <a href="https://en.wikipedia.org/wiki/Windows_Subsystem_for_Linux">WSL</a> being really good makes this somewhat worse, as it removes the need for people to switch to a real Linux system entirely if there is the occasional software requiring it. Yet, we have a lot more Linux users than before, and use it in many places where it makes sense. I also developed a novel data acquisition software that even runs on Linux-only and uses the abilities of the platform to its fullest extent. All of this resulted in me asking existing software and hardware vendors for Linux support a lot more often. Vendor-customer relationship in science is usually pretty good, and vendors do usually want to help out. Same for open source projects, especially if you offer to do Linux porting work for them… But overall, the ease of use and availability of required applications and their usability rules supreme. Most people are not technically knowledgeable and just want to get their research done in the best way possible, getting the best results with the least amount of friction.</p>
<figure class="wp-block-image size-large is-resized"><a href="https://blog.tenstral.net/wp-content/uploads/2024/01/inthewild_cern.jpg"><img alt="" class="wp-image-2025" height="576" src="https://blog.tenstral.net/wp-content/uploads/2024/01/inthewild_cern-1024x576.jpg" style="width: 840px; height: auto;" width="1024" /></a>KDE/Linux usage at a control station for a particle accelerator at Adlershof Technology Park, Germany, for reference (<a href="https://25years.kde.org/de/">by 25years of KDE</a>)<sup class="fn"><a href="https://blog.tenstral.net/category/planet/fdo/feed#d38e2c51-f896-4f30-a445-2c34cbddafe5" id="d38e2c51-f896-4f30-a445-2c34cbddafe5-link">3</a></sup></figure>
<h3 class="wp-block-heading">Back to the point</h3>
<p>The point of that story is this: GNOME, KDE, RHEL, Debian or Ubuntu: They all do not matter if the necessary applications are not available for them. And as soon as they are, the easiest-to-use solution wins. There are many facets of “easiest”: In many cases this is RHEL due to Red Hat support contracts being available, in many other cases it is Ubuntu due to its mindshare and ease of use. KDE Plasma is also frequently seen, as it is perceived a bit easier to onboard Windows users with it (among other benefits). Ultimately, it comes down to applications and 3rd-party support though.</p>
<p>Here’s a dirty secret: In many cases, porting an application to Linux is not that difficult. The thing that companies (and FLOSS projects too!) struggle with and will calculate the merits of carefully in advance is whether it is worth the support cost as well as continuous QA/testing. Their staff will have to do all of that work, and they could spend that time on other tasks after all.</p>
<p>So if they learn that “porting to Linux” not only means added testing and support, but also means to choose between the legacy X11 display server that allows for 1:1 porting from Windows or the “new” Wayland compositors that do not support the same features they need, they will quickly consider it not worth the effort at all. I have seen this happen.</p>
<p>Of course many apps use a cross-platform toolkit like Qt, which greatly simplifies porting. But this just moves the issue one layer down, as now the toolkit needs to abstract Windows, macOS and Wayland. And Wayland does not contain features to do certain things or does them very differently from e.g. Windows, so toolkits have no way to actually implement the existing functionality in a way that works on all platforms. So in Qt’s documentation you will often find texts like “works everywhere except for on Wayland compositors or mobile”<sup class="fn"><a href="https://blog.tenstral.net/category/planet/fdo/feed#3c0e34c4-43ff-47f1-84b4-b99a7d9aac2f" id="3c0e34c4-43ff-47f1-84b4-b99a7d9aac2f-link">4</a></sup>.</p>
<p>Many missing bits or altered behavior are just <a href="https://en.wikipedia.org/wiki/Paper_cut_bug">papercuts</a>, but those add up. And if users will have a worse experience, this will translate to more support work, or people not wanting to use the software on the respective platform.</p>
<h3 class="wp-block-heading">What’s missing?</h3>
<h4 class="wp-block-heading">Window positioning</h4>
<p>SDI applications with multiple windows are very popular in the scientific world. For data acquisition (for example with microscopes) we often have one monitor with control elements and one larger one with the recorded image. There is also other configurations where multiple signal modalities are acquired, and the experimenter aligns windows exactly in the way they want and expects the layout to be stored and to be loaded upon reopening the application. Even in the image from Adlershof Technology Park above you can see this style of UI design, at mega-scale. Being able to pop-out elements as windows from a single-window application to move them around freely is another frequently used paradigm, and immensely useful with these complex apps.</p>
<p>It is important to note that this is not a legacy design, but in many cases an intentional choice – these kinds of apps work incredibly well on larger screens or many screens and are very flexible (you can have any window configuration you want, and switch between them using the (usually) great window management abilities of your desktop).</p>
<p>Of course, these apps will work terribly on tablets and small form factors, but that is not the purpose they were designed for and nobody would use them that way.</p>
<p>I assumed for sure these features would be implemented at some point, but when it became clear that that would not happen, I created the <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/247">ext-placement</a> protocol which had some good discussion but was ultimately rejected from the <code>xdg</code> namespace. I then tried another solution based on feedback, which turned out not to work for most apps, and now proposed <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/264">xdg-placement (v2)</a> in an attempt to maybe still get some protocol done that we can agree on, exploring more options before pushing the existing protocol for inclusion into the <code>ext</code> Wayland protocol namespace. Meanwhile though, we can not port any application that needs this feature, while at the same time we are switching desktops and distributions to Wayland by default.</p>
<h4 class="wp-block-heading">Window position restoration</h4>
<p>Similarly, a protocol to <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/18">save & restore window positions</a> was already proposed in 2018, 6 years ago now, but it has still not been agreed upon, and may not even help multiwindow apps in its current form. The absence of this protocol means that applications can not restore their former window positions, and the user has to move them to their previous place again and again.</p>
<p>Meanwhile, toolkits can not adopt these protocols and applications can not use them and can not be ported to Wayland without introducing papercuts.</p>
<h4 class="wp-block-heading">Window icons</h4>
<p>Similarly, individual windows can not set their own icons, and not-installed applications can not have an icon at all because there is no desktop-entry file to load the icon from and no icon in the theme for them. You would think this is a niche issue, but for applications that create many windows, providing icons for them so the user can find them is fairly important. Of course it’s not the end of the world if every window has the same icon, but it’s one of those papercuts that make the software slightly less user-friendly. Even applications with fewer windows like LibrePCB <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/issues/52#note_2155885">are affected</a>, so much so that they rather run their app through Xwayland for now.</p>
<p>I decided to address this after I was working on data analysis of image data in a <a href="https://docs.python.org/3/library/venv.html">Python virtualenv</a>, where my code and the Python libraries used created lots of windows all with the default yellow “W” icon, making it impossible to distinguish them at a glance. This is <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/269">xdg-toplevel-icon</a> now, but of course it is an uphill battle where the very premise of needing this is questioned. So applications can not use it yet. </p>
<h4 class="wp-block-heading">Limited window abilities requiring specialized protocols</h4>
<p>Firefox has a <a href="https://support.mozilla.org/en-US/kb/about-picture-picture-firefox">picture-in-picture feature</a>, allowing it to pop out media from a mediaplayer as separate floating window so the user can watch the media while doing other things. On X11 this is easily realized, but on Wayland the restrictions posed on windows necessitate a different solution. The <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/132">xdg-pip</a> protocol was proposed for this specialized usecase, but it is also not merged yet. So this feature does not work as well on Wayland.</p>
<h4 class="wp-block-heading">Automated GUI testing / accessibility / automation</h4>
<p>Automation of GUI tasks is a powerful feature, so is the ability to auto-test GUIs. This is being worked on, with <a href="https://libinput.pages.freedesktop.org/libei/">libei</a> and <a href="https://gitlab.freedesktop.org/ofourdan/xwayland-run">wlheadless-run</a> (and stuff like <a href="https://github.com/ReimuNotMoe/ydotool">ydotool</a> exists too), but we’re not fully there yet.</p>
<h3 class="wp-block-heading">Wayland is frustrating for (some) application authors</h3>
<p>As you see, there is valid applications and valid usecases that can not be ported yet to Wayland with the same feature range they enjoyed on X11, Windows or macOS. So, from an application author’s perspective, Wayland <em>does</em> break things quite significantly, because things that worked before can no longer work and Wayland (the whole stack) does not provide any avenue to achieve the same result.</p>
<p>Wayland does “break” screen sharing, global hotkeys, gaming latency (via “no tearing”) etc, however for all of these there are solutions available that application authors can port to. And most developers will gladly do that work, especially since the newer APIs are usually a lot better and more robust. But if you give application authors no path forward except “use Xwayland and be on emulation as second-class citizen forever”, it just results in very frustrated application developers.</p>
<p>For some application developers, switching to a Wayland compositor is like buying a canvas from the Linux shop that forces your brush to only draw triangles. But maybe for your avant-garde art, you need to draw a circle. You can approximate one with triangles, but it will never be as good as the artwork of your friends who got their canvases from the Windows or macOS art supply shop and have more freedom to create their art.</p>
<h3 class="wp-block-heading">Triangles are proven to be the best shape! If you are drawing circles you are creating bad art!</h3>
<p>Wayland, via its protocol limitations, forces a certain way to build application UX – often for the better, but also sometimes to the detriment of users and applications. The protocols are often fairly opinionated, a result of the lessons learned from X11. In any case though, it is the odd one out – Windows and macOS do not pose the same limitations (for better or worse!), and the effort to port to Wayland is orders of magnitude bigger, or sometimes in case of the multiwindow UI paradigm impossible to achieve to the same level of polish. Desktop environments of course have a design philosophy that they want to push, and want applications to integrate as much as possible (same as macOS and Windows!). However, there are many applications out there, and pushing a design via protocol limitations will likely just result in fewer apps.</p>
<h3 class="wp-block-heading">The porting dilemma</h3>
<p>I spent probably way too much time looking into how to get applications cross-platform and running on Linux, often talking to vendors (FLOSS and proprietary) as well. Wayland limitations aren’t the biggest issue by far, but they do start to come come up now, especially in the scientific space with Ubuntu having switched to Wayland by default. For application authors there is often no way to address these issues. Many scientists do not even understand why their Python script that creates some GUIs suddenly behaves weirdly because Qt is now using the Wayland backend on Ubuntu instead of X11. They do not know the difference and also do not want to deal with these details – even though they may be programmers as well, the real goal is not to fiddle with the display server, but to get to a scientific result somehow.</p>
<p>Another issue is portability layers like Wine which need to run Windows applications as-is on Wayland. Apparently Wine’s Wayland driver has some heuristics to make window positioning work (and I am amazed by the work done on this!), but that can only go so far.</p>
<h3 class="wp-block-heading">A way out?</h3>
<p>So, how would we actually solve this? Fundamentally, this excessively long blog post boils down to just one essential question:</p>
<p><strong>Do we want to force applications to submit to a UX paradigm unconditionally, potentially loosing out on application ports or keeping apps on X11 eternally, or do we want to throw them some rope to get as many applications ported over to Wayland, even through we might sacrifice some protocol purity?</strong></p>
<p>I think we really have to answer that to make the discussions on wayland-protocols a lot less grueling. This question can be answered at the wayland-protocols level, but even more so it <em>must</em> be answered by the individual desktops and compositors.</p>
<p>If the answer for your environment turns out to be “Yes, we want the Wayland protocol to be more opinionated and will not make any compromises for application portability”, then your desktop/compositor should just immediately NACK protocols that add something like this and you simply shouldn’t engage in the discussion, as you reject the very premise of the new protocol: That it has any merit to exist and is needed in the first place. In this case contributors to Wayland and application authors also know where you stand, and a lot of debate is skipped. Of course, if application authors want to support your environment, you are basically asking them now to rewrite their UI, which they may or may not do. But at least they know what to expect and how to target your environment.</p>
<p>If the answer turns out to be “We do want some portability”, the next question obviously becomes where the line should be drawn and which changes are acceptable and which aren’t. We can’t blindly copy all X11 behavior, some porting work to Wayland is simply inevitable. Some written rules for that might be nice, but probably more importantly, if you agree fundamentally that there is an issue to be fixed, please engage in the discussions for the respective MRs! We for sure do not want to repeat X11 mistakes, and I am certain that we can implement protocols which provide the required functionality in a way that is a nice compromise in allowing applications a path forward into the Wayland future, while also being as good as possible and improving upon X11. For example, the toplevel-icon proposal is already a lot better than anything X11 ever had. <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/192">Relaxing ACK requirements for the ext namespace</a> is also a good proposed administrative change, as it allows some compositors to add features they want to support to the shared repository easier, while also not mandating them for others. In my opinion, it would allow for a lot less friction between the two different ideas of how Wayland protocol development should work. Some compositors could move forward and support more protocol extensions, while more restrictive compositors could support less things. Applications can detect supported protocols at launch and change their behavior accordingly (ideally even abstracted by toolkits).</p>
<p>You may now say that a lot of apps are ported, so surely this issue can not be that bad. And yes, what Wayland provides today may be enough for 80-90% of all apps. But what I hope the detour into the research lab has done is convince you that this smaller percentage of apps <em>matters</em>. A lot. And that it may be worthwhile to support them.</p>
<p>To end on a positive note: When it came to porting concrete apps over to Wayland, the only real showstoppers so far<sup class="fn"><a href="https://blog.tenstral.net/category/planet/fdo/feed#3cb0e40a-1202-4734-b501-c8ef302c458a" id="3cb0e40a-1202-4734-b501-c8ef302c458a-link">5</a></sup> were the missing window-positioning and window-position-restore features. I encountered them when porting my own software, and I got the issue as feedback from colleagues and fellow engineers. In second place was UI testing and automation support, the window-icon issue was mentioned twice, but being a cosmetic issue it likely simply hurts people less and they can ignore it easier.</p>
<p>What this means is that the majority of apps are already fine, and many others are very, very close! A Wayland future for everyone is within our grasp! <img alt="😄" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f604.png" style="height: 1em;" /></p>
<p>I will also bring my two protocol MRs to their conclusion for sure, because as application developers we need clarity on what the platform (either all desktops or even just a few) supports and will or will not support in future. And the only way to get something good done is by contribution and friendly discussion.</p>
<h4 class="wp-block-heading">Footnotes</h4>
<ol class="wp-block-footnotes"><li id="ddee6e08-d4f7-4154-a7e3-4d67d99399da">Apologies for the clickbait-y title – it comes with the subject <img alt="😉" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f609.png" style="height: 1em;" /> <a href="https://blog.tenstral.net/category/planet/fdo/feed#ddee6e08-d4f7-4154-a7e3-4d67d99399da-link"><img alt="↩" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/21a9.png" style="height: 1em;" />︎</a></li><li id="9f1f8a66-e687-4060-b58b-00d1f11bcf16">When I talk about “Wayland” I mean the combined set of display server protocols and accepted protocol extensions, unless otherwise clarified. <a href="https://blog.tenstral.net/category/planet/fdo/feed#9f1f8a66-e687-4060-b58b-00d1f11bcf16-link"><img alt="↩" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/21a9.png" style="height: 1em;" />︎</a></li><li id="d38e2c51-f896-4f30-a445-2c34cbddafe5">I would have picked a picture from our lab, but that would have needed permission first <a href="https://blog.tenstral.net/category/planet/fdo/feed#d38e2c51-f896-4f30-a445-2c34cbddafe5-link"><img alt="↩" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/21a9.png" style="height: 1em;" />︎</a></li><li id="3c0e34c4-43ff-47f1-84b4-b99a7d9aac2f">Qt has awesome “platform issues” pages, like for <a href="https://doc.qt.io/qt-6/macos-issues.html">macOS</a> and <a href="https://doc.qt.io/qt-6/linux-issues.html">Linux/X11</a> which help with porting efforts, but Qt doesn’t even list Linux/Wayland as <a href="https://doc.qt.io/qt-6/supported-platforms.html">supported platform</a>. There is some information though, like <a href="https://doc.qt.io/qt-6/application-windows.html#wayland-peculiarities">window geometry peculiarities</a>, which aren’t particularly helpful when porting (but still essential to know). <a href="https://blog.tenstral.net/category/planet/fdo/feed#3c0e34c4-43ff-47f1-84b4-b99a7d9aac2f-link"><img alt="↩" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/21a9.png" style="height: 1em;" />︎</a></li><li id="3cb0e40a-1202-4734-b501-c8ef302c458a">Besides issues with Nvidia hardware – CUDA for simulations and machine-learning is pretty much everywhere, so Nvidia cards are common, which causes trouble on Wayland still. It is improving though. <a href="https://blog.tenstral.net/category/planet/fdo/feed#3cb0e40a-1202-4734-b501-c8ef302c458a-link"><img alt="↩" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/21a9.png" style="height: 1em;" />︎</a></li></ol>2024-01-11T16:24:00+00:00Maira Canal: Introducing CPU jobs to the Raspberry Pi
https://mairacanal.github.io/introducing-cpu-jobs-to-the-rpi/
<p><a href="https://www.igalia.com">Igalia</a> is always working hard to improve 3D rendering
drivers of the Broadcom VideoCore GPU, found in Raspberry Pi devices. One of our
most recent efforts in this sense was the implementation of CPU jobs from the
Vulkan driver to the V3D kernel driver.</p>
<h2 id="what-are-cpu-jobs-and-why-do-we-need-them">What are CPU jobs and why do we need them?</h2>
<p>In the V3DV driver, there are some Vulkan commands that cannot be performed by
the GPU alone, so we implement those as CPU jobs on Mesa. A CPU job is a job
that requires CPU intervention to be performed. For example, in the Broadcom
VideoCore GPUs, we don’t have a way to calculate the timestamp. But we need the
timestamp for Vulkan <a href="https://docs.vulkan.org/samples/latest/samples/api/timestamp_queries/README.html">timestamp
queries</a>.
Therefore, we need to calculate the timestamp on the CPU.</p>
<p>A CPU job in userspace also implies CPU stalling. Sometimes, we need to hold
part of the command submission flow in order to correctly synchronize their
execution. This waiting period caused the CPU to stall, thereby preventing the
continuous submission of jobs to the GPU. To mitigate this issue, we decided to
move CPU job mechanisms from the V3DV driver to the V3D kernel driver.</p>
<p>In the V3D kernel driver, we have different kinds of jobs: RENDER jobs, BIN
jobs, CSD jobs, TFU jobs, and CLEAN CACHE jobs. For each of those jobs, we have
a DRM scheduler instance that helps us to synchronize the jobs.</p>
<blockquote>
<p>If you want to know more about the different kinds of V3D jobs, check out this
<a href="https://mairacanal.github.io/november-update-exploring-v3d/">November Update: Exploring
V3D</a> blogpost,
where I explain more about all the V3D
<a href="https://en.wikipedia.org/wiki/Ioctl">IOCTLs</a> and jobs.</p>
</blockquote>
<p>Jobs of the same kind are submitted, dispatched, and processed in the same order
they are executed, using a standard first-in-first-out (FIFO) queue system. We
can synchronize different jobs across different queues using DRM syncobjs. More
about the V3D synchronization framework and user extensions can be learned in
<a href="https://melissawen.github.io/blog/2022/05/10/multisync-p1">this two-part blog
post</a> from Melissa
Wen.</p>
<blockquote>
<p>From the kernel documentation, a DRM syncobj (synchronisation objects) are
containers for stuff that helps sync up GPU commands. They’re super handy
because you can use them in your own programs, share them with other programs,
and even use them across different DRM drivers. Mostly, they’re used for
making Vulkan fences and semaphores work.</p>
</blockquote>
<p>By moving the CPU job from userspace to the kernel, we can make use of the DRM
schedule queues and all the advantages it brings with it. For this, we created a
new type of job in the V3D kernel driver, a CPU job, which also means creating a
new DRM scheduler instance and a CPU job queue. Now, instead of stalling the
submission thread waiting for the GPU to idle, we can use DRM syncobjs to
synchronize both CPU and GPU jobs in a submission, providing more efficient
usage of the GPU.</p>
<h2 id="how-did-we-implement-the-cpu-jobs-in-the-kernel-driver">How did we implement the CPU jobs in the kernel driver?</h2>
<p>After we decided to have a CPU job implementation in the kernel space, we could
think about two possible implementations for this job: creating an IOCTL for
each type of CPU job or using a user extension to provide a polymorphic behavior
to a single CPU job IOCTL.</p>
<p>We have different types of CPU jobs (indirect CSD jobs, timestamp query jobs,
copy query results jobs…) and each of them has a common infrastructure
of allocation and synchronization but performs different operations. Therefore,
we decided to go with the option to use user extensions.</p>
<p>On <a href="https://melissawen.github.io/blog/2022/05/10/multisync-p1">Melissa’s blogpost</a>, she digs
deep into the implementation of generic IOCTL extensions in the V3D kernel
driver. But, to put it simply, instead of expanding the data struct for each
IOCTL every time we need to add a new feature, we define a user extension chain
instead. As we add new optional interfaces to control the IOCTL, we define a new
extension struct that can be linked to the IOCTL data only when required by the
user.</p>
<p>Therefore, we created a new IOCTL, <code class="language-plaintext highlighter-rouge">drm_v3d_submit_cpu</code>, which is used to submit
any type of CPU job. This single IOCTL can be extended by a user extension,
which allows us to reuse the common infrastructure - avoiding code
repetition - and yet use the user extension ID to identify the type of job
and depending on the type of job, perform a certain operation.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">drm_v3d_submit_cpu</span> <span class="p">{</span>
<span class="cm">/* Pointer to a u32 array of the BOs that are referenced by the job.
*
* For DRM_V3D_EXT_ID_CPU_INDIRECT_CSD, it must contain only one BO,
* that contains the workgroup counts.
*
* For DRM_V3D_EXT_ID_TIMESTAMP_QUERY, it must contain only one BO,
* that will contain the timestamp.
*
* For DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY, it must contain only
* one BO, that contains the timestamp.
*
* For DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY, it must contain two
* BOs. The first is the BO where the timestamp queries will be written
* to. The second is the BO that contains the timestamp.
*
* For DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY, it must contain no
* BOs.
*
* For DRM_V3D_EXT_ID_CPU_COPY_PERFORMANCE_QUERY, it must contain one
* BO, where the performance queries will be written.
*/</span>
<span class="n">__u64</span> <span class="n">bo_handles</span><span class="p">;</span>
<span class="cm">/* Number of BO handles passed in (size is that times 4). */</span>
<span class="n">__u32</span> <span class="n">bo_handle_count</span><span class="p">;</span>
<span class="n">__u32</span> <span class="n">flags</span><span class="p">;</span>
<span class="cm">/* Pointer to an array of ioctl extensions*/</span>
<span class="n">__u64</span> <span class="n">extensions</span><span class="p">;</span>
<span class="p">};</span>
</code></pre></div></div>
<p>Now, we can create a CPU job and submit it with a CPU job user extension.</p>
<p>And which extensions are available?</p>
<ol>
<li><a href="https://cgit.freedesktop.org/drm/drm-misc/commit/?id=18b8413b25b7070fa2e55858a2c808e6909581d0"><code class="language-plaintext highlighter-rouge">DRM_V3D_EXT_ID_CPU_INDIRECT_CSD</code></a>:
this CPU job allows us to submit an indirect CSD job. An indirect CSD job is a
job that, when executed in the queue, will map an indirect buffer, read the
dispatch parameters, and submit a regular dispatch. This CPU job is used in
Vulkan calls like <code class="language-plaintext highlighter-rouge">vkCmdDispatchIndirect()</code>.</li>
<li><a href="https://cgit.freedesktop.org/drm/drm-misc/commit/?id=9ba0ff3e083f6a4a0b6698f06bfff74805fefa5f"><code class="language-plaintext highlighter-rouge">DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY</code></a>:
this CPU job calculates the query timestamp and updates the query availability
by signaling a syncobj. This CPU job is used in Vulkan calls like <code class="language-plaintext highlighter-rouge">vkCmdWriteTimestamp()</code>.</li>
<li><a href="https://cgit.freedesktop.org/drm/drm-misc/commit/?id=34a101e64296c736b14ce27e647fcebd70cb7bf8"><code class="language-plaintext highlighter-rouge">DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY</code></a>:
this CPU job resets the timestamp queries based on the value offset of the first
query. This CPU job is used in Vulkan calls like <code class="language-plaintext highlighter-rouge">vkCmdResetQueryPool()</code> for timestamp queries.</li>
<li><a href="https://cgit.freedesktop.org/drm/drm-misc/commit/?id=6745f3e44a20ac18e7e5a40a3c7f62225983d544"><code class="language-plaintext highlighter-rouge">DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY</code></a>:
this CPU job copies the complete or partial result of a query to a buffer.
This CPU job is used in Vulkan calls like <code class="language-plaintext highlighter-rouge">vkCmdCopyQueryPoolResults()</code> for timestamp queries.</li>
<li><a href="https://cgit.freedesktop.org/drm/drm-misc/commit/?id=bae7cb5d68001a8d4ceec5964dda74bb9aab7220"><code class="language-plaintext highlighter-rouge">DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY</code></a>:
this CPU job resets the performance queries by resetting the values of the
perfmons. This CPU job is used in Vulkan calls like <code class="language-plaintext highlighter-rouge">vkCmdResetQueryPool()</code> for performance queries.</li>
<li><a href="https://cgit.freedesktop.org/drm/drm-misc/commit/?id=209e8d2695ee7a67a5b0487bbd1aa75e290d0f41"><code class="language-plaintext highlighter-rouge">DRM_V3D_EXT_ID_CPU_COPY_PERFORMANCE_QUERY</code></a>:
similar to <code class="language-plaintext highlighter-rouge">DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY</code>, this CPU job copies the
complete or partial result of a query to a buffer. This CPU job is used in Vulkan
calls like <code class="language-plaintext highlighter-rouge">vkCmdCopyQueryPoolResults()</code> for performance queries.</li>
</ol>
<p>The CPU job IOCTL structure is similar to any other V3D job. We allocate the job
struct, parse all the extensions, init the job, look up the BOs and lock its
reservations, add the proper dependencies, and push the job to the DRM scheduler
entity.</p>
<p>When running a CPU job, we execute the following code:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="k">const</span> <span class="n">v3d_cpu_job_fn</span> <span class="n">cpu_job_function</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">[</span><span class="n">V3D_CPU_JOB_TYPE_INDIRECT_CSD</span><span class="p">]</span> <span class="o">=</span> <span class="n">v3d_rewrite_csd_job_wg_counts_from_indirect</span><span class="p">,</span>
<span class="p">[</span><span class="n">V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY</span><span class="p">]</span> <span class="o">=</span> <span class="n">v3d_timestamp_query</span><span class="p">,</span>
<span class="p">[</span><span class="n">V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY</span><span class="p">]</span> <span class="o">=</span> <span class="n">v3d_reset_timestamp_queries</span><span class="p">,</span>
<span class="p">[</span><span class="n">V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY</span><span class="p">]</span> <span class="o">=</span> <span class="n">v3d_copy_query_results</span><span class="p">,</span>
<span class="p">[</span><span class="n">V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY</span><span class="p">]</span> <span class="o">=</span> <span class="n">v3d_reset_performance_queries</span><span class="p">,</span>
<span class="p">[</span><span class="n">V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY</span><span class="p">]</span> <span class="o">=</span> <span class="n">v3d_copy_performance_query</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">static</span> <span class="k">struct</span> <span class="n">dma_fence</span> <span class="o">*</span>
<span class="nf">v3d_cpu_job_run</span><span class="p">(</span><span class="k">struct</span> <span class="n">drm_sched_job</span> <span class="o">*</span><span class="n">sched_job</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">v3d_cpu_job</span> <span class="o">*</span><span class="n">job</span> <span class="o">=</span> <span class="n">to_cpu_job</span><span class="p">(</span><span class="n">sched_job</span><span class="p">);</span>
<span class="k">struct</span> <span class="n">v3d_dev</span> <span class="o">*</span><span class="n">v3d</span> <span class="o">=</span> <span class="n">job</span><span class="o">-></span><span class="n">base</span><span class="p">.</span><span class="n">v3d</span><span class="p">;</span>
<span class="n">v3d</span><span class="o">-></span><span class="n">cpu_job</span> <span class="o">=</span> <span class="n">job</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">job</span><span class="o">-></span><span class="n">job_type</span> <span class="o">>=</span> <span class="n">ARRAY_SIZE</span><span class="p">(</span><span class="n">cpu_job_function</span><span class="p">))</span> <span class="p">{</span>
<span class="n">DRM_DEBUG_DRIVER</span><span class="p">(</span><span class="s">"Unknown CPU job: %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">job</span><span class="o">-></span><span class="n">job_type</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">trace_v3d_cpu_job_begin</span><span class="p">(</span><span class="o">&</span><span class="n">v3d</span><span class="o">-></span><span class="n">drm</span><span class="p">,</span> <span class="n">job</span><span class="o">-></span><span class="n">job_type</span><span class="p">);</span>
<span class="n">cpu_job_function</span><span class="p">[</span><span class="n">job</span><span class="o">-></span><span class="n">job_type</span><span class="p">](</span><span class="n">job</span><span class="p">);</span>
<span class="n">trace_v3d_cpu_job_end</span><span class="p">(</span><span class="o">&</span><span class="n">v3d</span><span class="o">-></span><span class="n">drm</span><span class="p">,</span> <span class="n">job</span><span class="o">-></span><span class="n">job_type</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The interesting thing is that each CPU job type executes a completely different operation.</p>
<p>The complete kernel implementation has already landed in drm-misc-next and can
be seen right
<a href="https://lore.kernel.org/dri-devel/20231130164420.932823-2-mcanal@igalia.com/T/">here</a>.</p>
<h2 id="what-did-we-change-in-mesa-v3dv-to-use-the-new-kernel-v3d-cpu-job">What did we change in Mesa-V3DV to use the new kernel-V3D CPU job?</h2>
<p>After landing the kernel implementation, I needed to accommodate the new CPU job
approach in the userspace.</p>
<p>A fundamental rule is not to cause regressions, i.e., to keep backwards
userspace compatibility with old versions of the Linux kernel. This means we
cannot break new versions of Mesa running in old kernels. Therefore, we needed
to create two paths: one preserving the old way to perform CPU jobs and the
other using the kernel to perform CPU jobs.</p>
<p>So, for example, the indirect CSD job used to add two different jobs to the
queue: a CPU job and a CSD job. Now, if we have the CPU job capability in the
kernel, we only add a CPU job and the CSD job is dispatched from within the
kernel.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">- list_addtail(&csd_job->list_link, &cmd_buffer->jobs);
</span><span class="gi">+
+ /* If we have a CPU queue we submit the CPU job directly to the
+ * queue and the CSD job will be dispatched from within the kernel
+ * queue, otherwise we will have to dispatch the CSD job manually
+ * right after the CPU job by adding it to the list of jobs in the
+ * command buffer.
+ */
+ if (!cmd_buffer->device->pdevice->caps.cpu_queue)
+ list_addtail(&csd_job->list_link, &cmd_buffer->jobs);
</span></code></pre></div></div>
<p>Furthermore, now we can use syncobjs to sync the CPU jobs. For example, in the
timestamp query CPU job, we used to stall the submission thread and wait for
completion of all work queued before the timestamp query. Now, we can just add a
barrier to the CPU job and it will be properly synchronized by the syncobjs
without stalling the submission thread.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="cm">/* The CPU job should be serialized so it only executes after all previously
* submitted work has completed
*/</span>
<span class="n">job</span><span class="o">-></span><span class="n">serialize</span> <span class="o">=</span> <span class="n">V3DV_BARRIER_ALL</span><span class="p">;</span>
</code></pre></div></div>
<p>We were able to test the implementation using multiple CTS tests, such as
<code class="language-plaintext highlighter-rouge">dEQP-VK.compute.pipeline.indirect_dispatch.*</code>,
<code class="language-plaintext highlighter-rouge">dEQP-VK.pipeline.monolithic.timestamp.*</code>, <code class="language-plaintext highlighter-rouge">dEQP-VK.synchronization.*</code>,
<code class="language-plaintext highlighter-rouge">dEQP-VK.query_pool.*</code> and <code class="language-plaintext highlighter-rouge">dEQP-VK.multiview.*</code>.</p>
<p>The userspace implementation has already landed in Mesa and the full
implementation can be checked in this
<a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448">MR</a>.</p>
<hr />
<p>More about the on-going challenges in the Raspberry Pi driver stack can be
checked during this <a href="https://www.youtube.com/watch?v=Gk49xj4jds4">XDC 2023 talk</a>
presented by Iago Toral, Juan Suárez and myself. During this talk, Iago
mentioned the CPU job work that we have been doing.</p>
<p>Also I cannot finish this post without thanking <a href="https://melissawen.github.io/">Melissa
Wen</a> and <a href="https://blogs.igalia.com/itoral/author/itoral/">Iago
Toral</a> for all the help while
developing the CPU jobs for the V3D kernel driver.</p>2024-01-11T13:30:00+00:00Tomeu Vizoso: Etnaviv NPU update 14: Object detection with decent performance
https://blog.tomeuvizoso.net/2024/01/etnaviv-npu-update-14-object-detection.html
<p>When almost two months ago I <a href="https://blog.tomeuvizoso.net/2023/11/etnaviv-npu-update-11-now-twice-as-fast.html">got MobileNetV1 running with useful performance</a> on my driver for the Vivante NPU, I took that milestone as a partial validation of my approach.</p><p>Partial because MobileNetV1 is a quite old model by now and since then several iterations have passed with better accuracy and better performance. Would I be able to, without any documentation, add enough support to run newer models with useful performance?<br /></p><p>Since then, I have been spending some time looking at the state of the art for object detection models. Getting a sense of the gap between the features supported by my driver and the operations that the newer models use.</p><p><a href="https://arxiv.org/abs/2004.14525">SSDLite MobileDet</a> is already 3 years old but can still be considered state-of-the-art on most hardware, with good accuracy while having a low latency.</p><p>The graph structure was more complex than that of MobileNet, and it used tensor addition operations which I didn't support at the moment. There are other operations that I didn't support, but those were at the end and could be performed in the CPU without much penalty.</p><p>So after implementing additions along with a few medium-sized refactorings, I got the model running correctly:<br /></p><p></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsKGZYGx2ISm4TZobIq5OCov58aMRXLldRjrjM2dn0uUxuhChV1-gxt4wzLvEq1WZHe8pbdz4MtXML9oN2UCGvq2K_ncYuKkVnK4AG-_xrRGfARWv3kxBBvG20y5eWzFTWeZGazHFMIqaswvk1hl5kN-xArwD2TqjPj-iZxOPVMKzfx8PPbOagoSldJh0/s1536/test1.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="366" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhsKGZYGx2ISm4TZobIq5OCov58aMRXLldRjrjM2dn0uUxuhChV1-gxt4wzLvEq1WZHe8pbdz4MtXML9oN2UCGvq2K_ncYuKkVnK4AG-_xrRGfARWv3kxBBvG20y5eWzFTWeZGazHFMIqaswvk1hl5kN-xArwD2TqjPj-iZxOPVMKzfx8PPbOagoSldJh0/w548-h366/test1.jpg" width="548" /></a></div><p></p><p>Performance wasn't that bad at that moment, at 129ms it was twice as fast as the CPU and "only" 5 times slower than the proprietary driver.</p><p>I knew that I was using extremely conservative values for the size of the output tiles, so I wrote some scripts to run hundreds of different convolution configurations and tabulate the parameters that the proprietary driver used to program the hardware.</p><p>After a lot of time spent staring at a spreadsheet I came up with a reasonable guess at what are the conditions that limit the size of the tiles. By using the biggest tile size that is still safe, I got much better performance: 56.149ms, so almost 18 inferences can be performed per second.</p><p>If we look at a practical use case such that supported by <a href="https://frigate.video/">Frigate NVR</a>, a typical frame rate for the video inputs is 5 FPS. With our current performance level, we could run 3-4 inferences on each frame if there may be several objects being tracked at the same time, or 3-4 cameras simultaneously if not.</p><p>Given the price level of the <a href="https://libre.computer/products/aml-a311d-cc/">single board computers that contain the VIPNano</a>, this is quite a good bang for your bucks. And all open source and heading to mainline!</p><p><b>Next steps</b></p><p>I have started cleaning up the latest changes so they can be reviewed upstream. And need to make sure that the in-flight patches to the kernel are merged now that the window for 6.8 has opened.</p>2024-01-10T11:14:00+00:00Lennart Poettering: A re-introduction to mkosi -- A Tool for Generating OS Images
https://0pointer.net/blog/a-re-introduction-to-mkosi-a-tool-for-generating-os-images.html
<blockquote>
<p>This is a guest post written by Daan De Meyer, systemd and mkosi
maintainer</p>
</blockquote>
<p>Almost 7 years ago, Lennart first
<a href="https://0pointer.net/blog/mkosi-a-tool-for-generating-os-images.html">wrote</a>
about <code>mkosi</code> on this blog. Some years ago, I took over development and
there's been a huge amount of changes and improvements since then. So I
figure this is a good time to re-introduce <code>mkosi</code>.</p>
<p><a href="https://github.com/systemd/mkosi"><code>mkosi</code></a> stands for <em>Make Operating
System Image</em>. It generates OS images that can be used for a variety of
purposes.</p>
<p>If you prefer watching a video over reading a blog post, you can also
watch my <a href="https://www.youtube.com/watch?v=6EelcbjbUa8">presentation</a> on
<code>mkosi</code> at All Systems Go 2023.</p>
<h2>What is mkosi?</h2>
<p><code>mkosi</code> was originally written as a tool to simplify hacking on systemd
and for experimenting with images using many of the new concepts being
introduced in systemd at the time. In the meantime, it has evolved into
a general purpose image builder that can be used in a multitude of
scenarios.</p>
<p>Instructions to install <code>mkosi</code> can be found in its
<a href="https://github.com/systemd/mkosi/blob/main/README.md">readme</a>. We
recommend running the latest version to take advantage of all the latest
features and bug fixes. You'll also need <code>bubblewrap</code> and the package
manager of your favorite distribution to get started.</p>
<p>At its core, the workflow of <code>mkosi</code> can be divided into 3 steps:</p>
<ol>
<li>Generate an OS tree for some distribution by installing a set of
packages.</li>
<li>Package up that OS tree in a variety of output formats.</li>
<li>(Optionally) Boot the resulting image in <code>qemu</code> or <code>systemd-nspawn</code>.</li>
</ol>
<p>Images can be built for any of the following distributions:</p>
<ul>
<li>Fedora Linux</li>
<li>Ubuntu</li>
<li>OpenSUSE</li>
<li>Debian</li>
<li>Arch Linux</li>
<li>CentOS Stream</li>
<li>RHEL</li>
<li>Rocky Linux</li>
<li>Alma Linux</li>
</ul>
<p>And the following output formats are supported:</p>
<ul>
<li>GPT disk images built with <code>systemd-repart</code></li>
<li>Tar archives</li>
<li>CPIO archives (for building initramfs images)</li>
<li>USIs (Unified System Images which are full OS images packed in a UKI)</li>
<li>Sysext, confext and portable images</li>
<li>Directory trees</li>
</ul>
<p>For example, to build an Arch Linux GPT disk image and boot it in
<code>qemu</code>, you can run the following command:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>mkosi<span class="w"> </span>-d<span class="w"> </span>arch<span class="w"> </span>-p<span class="w"> </span>systemd<span class="w"> </span>-p<span class="w"> </span>udev<span class="w"> </span>-p<span class="w"> </span>linux<span class="w"> </span>-t<span class="w"> </span>disk<span class="w"> </span>qemu
</code></pre></div>
<p>To instead boot the image in systemd-nspawn, replace <code>qemu</code> with <code>boot</code>:</p>
<div class="highlight"><pre><span></span><code>$<span class="w"> </span>mkosi<span class="w"> </span>-d<span class="w"> </span>arch<span class="w"> </span>-p<span class="w"> </span>systemd<span class="w"> </span>-p<span class="w"> </span>udev<span class="w"> </span>-p<span class="w"> </span>linux<span class="w"> </span>-t<span class="w"> </span>disk<span class="w"> </span>boot
</code></pre></div>
<p>The actual image can be found in the current working directory named
<code>image.raw</code>. However, using a separate output directory is recommended
which is as simple as running <code>mkdir mkosi.output</code>.</p>
<p>To rebuild the image after it's already been built once, add <code>-f</code> to the
command line before the verb to rebuild the image. Any arguments passed
after the verb are forwarded to either <code>systemd-nspawn</code> or <code>qemu</code>
itself. To build the image without booting it, pass <code>build</code> instead of
<code>boot</code> or <code>qemu</code> or don't pass a verb at all.</p>
<p>By default, the disk image will have an appropriately sized root
partition and an ESP partition, but the partition layout and contents
can be fully customized using <code>systemd-repart</code> by creating partition
definition files in <code>mkosi.repart/</code>. This allows you to customize the
partition as you see fit:</p>
<ul>
<li>The root partition can be encrypted.</li>
<li>Partition sizes can be customized.</li>
<li>Partitions can be protected with signed dm-verity.</li>
<li>You can opt out of having a root partition and only have a /usr
partition instead.</li>
<li>You can add various other partitions, e.g. an XBOOTLDR partition or a
swap partition.</li>
<li>...</li>
</ul>
<p>As part of building the image, we'll run various tools such as
<code>systemd-sysusers</code>, <code>systemd-firstboot</code>, <code>depmod</code>, <code>systemd-hwdb</code> and
more to make sure the image is set up correctly.</p>
<h2>Configuring mkosi image builds</h2>
<p>Naturally with extended use you don't want to specify all settings on
the command line every time, so <code>mkosi</code> supports configuration files
where the same settings that can be specified on the command line can be
written down.</p>
<p>For example, the command we used above can be written down in a
configuration file <code>mkosi.conf</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">[Distribution]</span>
<span class="na">Distribution</span><span class="o">=</span><span class="s">arch</span>
<span class="k">[Output]</span>
<span class="na">Format</span><span class="o">=</span><span class="s">disk</span>
<span class="k">[Content]</span>
<span class="na">Packages</span><span class="o">=</span>
<span class="w"> </span><span class="na">systemd</span>
<span class="w"> </span><span class="na">udev</span>
<span class="w"> </span><span class="na">linux</span>
</code></pre></div>
<p>Like systemd, <code>mkosi</code> uses INI configuration files. We also support
dropins which can be placed in <code>mkosi.conf.d</code>. Configuration files can
also be conditionalized using the <code>[Match]</code> section. For example, to
only install a specific package on Arch Linux, you can write the
following to <code>mkosi.conf.d/10-arch.conf</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">[Match]</span>
<span class="na">Distribution</span><span class="o">=</span><span class="s">arch</span>
<span class="k">[Content]</span>
<span class="na">Packages</span><span class="o">=</span><span class="s">pacman</span>
</code></pre></div>
<p>Because not everything you need will be supported in <code>mkosi</code>, we support
running scripts at various points during the image build process where
all extra image customization can be done. For example, if it is found,
<code>mkosi.postinst</code> is called after packages have been installed. Scripts
are executed on the host system by default (in a sandbox), but can be
executed inside the image by suffixing the script with <code>.chroot</code>, so if
<code>mkosi.postinst.chroot</code> is found it will be executed inside the image.</p>
<p>To add extra files to the image, you can place them in <code>mkosi.extra</code> in
the source directory and they will be automatically copied into the
image after packages have been installed.</p>
<h2>Bootable images</h2>
<p>If the necessary packages are installed, <code>mkosi</code> will automatically
generate a UEFI/BIOS bootable image. As <code>mkosi</code> is a systemd project, it
will always build
<a href="https://uapi-group.org/specifications/specs/unified_kernel_image/">UKIs</a>
(Unified Kernel Images), except if the image is BIOS-only (since UKIs
cannot be used on BIOS). The initramfs is built like a regular image by
installing distribution packages and packaging them up in a CPIO archive
instead of a disk image. Specifically, we do not use <code>dracut</code>,
<code>mkinitcpio</code> or <code>initramfs-tools</code> to generate the initramfs from the
host system. <code>ukify</code> is used to assemble all the individual components
into a UKI.</p>
<p>If you don't want <code>mkosi</code> to generate a bootable image, you can set
<code>Bootable=no</code> to explicitly disable this logic.</p>
<h2>Using mkosi for development</h2>
<p>The main requirements to use <code>mkosi</code> for development is that we can
build our source code against the image we're building and install it
into the image we're building. <code>mkosi</code> supports this via build scripts.
If a script named <code>mkosi.build</code> (or <code>mkosi.build.chroot</code>) is found,
we'll execute it as part of the build. Any files put by the build script
into <code>$DESTDIR</code> will be installed into the image. Required build
dependencies can be installed using the <code>BuildPackages=</code> setting. These
packages are installed into an overlay which is put on top of the image
when running the build script so the build packages are available when
running the build script but don't end up in the final image.</p>
<p>An example <code>mkosi.build.chroot</code> script for a project using <code>meson</code> could
look as follows:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/sh</span>
meson<span class="w"> </span>setup<span class="w"> </span><span class="s2">"</span><span class="nv">$BUILDDIR</span><span class="s2">"</span><span class="w"> </span><span class="s2">"</span><span class="nv">$SRCDIR</span><span class="s2">"</span>
ninja<span class="w"> </span>-C<span class="w"> </span><span class="s2">"</span><span class="nv">$BUILDDIR</span><span class="s2">"</span>
<span class="k">if</span><span class="w"> </span><span class="o">((</span>WITH_TESTS<span class="o">))</span><span class="p">;</span><span class="w"> </span><span class="k">then</span>
<span class="w"> </span>meson<span class="w"> </span><span class="nb">test</span><span class="w"> </span>-C<span class="w"> </span><span class="s2">"</span><span class="nv">$BUILDDIR</span><span class="s2">"</span>
<span class="k">fi</span>
meson<span class="w"> </span>install<span class="w"> </span>-C<span class="w"> </span><span class="s2">"</span><span class="nv">$BUILDDIR</span><span class="s2">"</span>
</code></pre></div>
<p>Now, every time the image is built, the build script will be executed
and the results will be installed into the image.</p>
<p>The <code>$BUILDDIR</code> environment variable points to a directory that can be
used as the build directory for build artifacts to allow for incremental
builds if the build system supports it.</p>
<p>Of course, downloading all packages from scratch every time and
re-installing them again every time the image is built is rather slow,
so <code>mkosi</code> supports two modes of caching to speed things up.</p>
<p>The first caching mode caches all downloaded packages so they don't have
to be downloaded again on subsequent builds. Enabling this is as simple
as running <code>mkdir mkosi.cache</code>.</p>
<p>The second mode of caching caches the image after all packages have been
installed but before running the build script. On subsequent builds,
<code>mkosi</code> will copy the cache instead of reinstalling all packages from
scratch. This mode can be enabled using the <code>Incremental=</code> setting.
While there is some rudimentary cache invalidation, the cache can also
forcibly be rebuilt by specifying <code>-ff</code> on the command line instead of
<code>-f</code>.</p>
<p>Note that when running on a btrfs filesystem, <code>mkosi</code> will automatically
use subvolumes for the cached images which can be snapshotted on
subsequent builds for even faster rebuilds. We'll also use reflinks to
do copy-on-write copies where possible.</p>
<p>With this setup, by running <code>mkosi -f qemu</code> in the systemd repository,
it takes about 40 seconds to go from a source code change to a root
shell in a virtual machine running the latest systemd with your change
applied. This makes it very easy to test changes to systemd in a safe
environment without risk of breaking your host system.</p>
<p>Of course, while 40 seconds is not a very long time, it's still more
than we'd like, especially if all we're doing is modifying the kernel
command line. That's why we have the <code>KernelCommandLineExtra=</code> option to
configure kernel command line options that are passed to the container
or virtual machine at runtime instead of being embedded into the image.
These extra kernel command line options are picked up when the image is
booted with qemu's direct kernel boot (using <code>-append</code>), but also when
booting a disk image in UEFI mode (using SMBIOS). The same applies to
systemd credentials (using the <code>Credentials=</code> setting). These settings
allow configuring the image without having to rebuild it, which means
that you only have to run <code>mkosi qemu</code> or <code>mkosi boot</code> again afterwards
to apply the new settings.</p>
<h2>Building images without root privileges and loop devices</h2>
<p>By using <code>newuidmap</code>/<code>newgidmap</code> and <code>systemd-repart</code>, <code>mkosi</code> is able to
build images without needing root privileges. As long as proper subuid
and subgid mappings are set up for your user in <code>/etc/subuid</code> and
<code>/etc/subgid</code>, you can run <code>mkosi</code> as your regular user without having
to switch to <code>root</code>.</p>
<p>Note that as of the writing of this blog post this only applies to the
<code>build</code> and <code>qemu</code> verbs. Booting the image in a <code>systemd-nspawn</code>
container with <code>mkosi boot</code> still needs root privileges. We're hoping to
fix this in an future systemd release.</p>
<p>Regardless of whether you're running <code>mkosi</code> with root or without root,
almost every tool we execute is invoked in a sandbox to isolate as much
of the build process from the host as possible. For example, <code>/etc</code> and
<code>/var</code> from the host are not available in this sandbox, to avoid host
configuration inadvertently affecting the build.</p>
<p>Because <code>systemd-repart</code> can build disk images without loop devices,
<code>mkosi</code> can run from almost any environment, including containers. All
that's needed is a UID range with 65536 UIDs available, either via
running as the root user or via <code>/etc/subuid</code> and <code>newuidmap</code>. In a
future systemd release, we're hoping to provide an alternative to
<code>newuidmap</code> and <code>/etc/subuid</code> to allow running <code>mkosi</code> from all
containers, even those with only a single UID available.</p>
<h2>Supporting older distributions</h2>
<p>mkosi depends on very recent versions of various systemd tools (v254 or
newer). To support older distributions, we implemented so called tools
trees. In short, <code>mkosi</code> can first build a tools image for you that
contains all required tools to build the actual image. This can be
enabled by adding <code>ToolsTree=default</code> to your mkosi configuration.
Building a tools image does not require a recent version of systemd.</p>
<p>In the systemd mkosi configuration, we automatically use a tools tree if
we detect your distribution does not have the minimum required systemd
version installed.</p>
<h2>Configuring variants of the same image using profiles</h2>
<p>Profiles can be defined in the <code>mkosi.profiles/</code> directory. The profile
to use can be selected using the <code>Profile=</code> setting (or <code>--profile=</code>) on
the command line. A profile allows you to bundle various settings behind
a single recognizable name. Profiles can also be matched on if you want
to apply some settings only to a few profiles.</p>
<p>For example, you could have a <code>bootable</code> profile that sets
<code>Bootable=yes</code>, adds the <code>linux</code> and <code>systemd-boot</code> packages and
configures <code>Format=disk</code> to end up with a bootable disk image when
passing <code>--profile bootable</code> on the kernel command line.</p>
<h2>Building system extension images</h2>
<p><a href="https://uapi-group.org/specifications/specs/extension_image/">System extension</a>
images may – dynamically at runtime — extend the base system with an
overlay containing additional files.</p>
<p>To build system extensions with <code>mkosi</code>, we need a base image on top of
which we can build our extension.</p>
<p>To keep things manageable, we'll make use of <code>mkosi</code>'s support for
building multiple images so that we can build our base image and system
extension in one go.</p>
<p>We start by creating a temporary directory with a base configuration
file <code>mkosi.conf</code> with some shared settings:</p>
<div class="highlight"><pre><span></span><code><span class="k">[Output]</span>
<span class="na">OutputDirectory</span><span class="o">=</span><span class="s">mkosi.output</span>
<span class="na">CacheDirectory</span><span class="o">=</span><span class="s">mkosi.cache</span>
</code></pre></div>
<p>Now let's continue with the base image definition by writing the
following to <code>mkosi.images/base/mkosi.conf</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">[Output]</span>
<span class="na">Format</span><span class="o">=</span><span class="s">directory</span>
<span class="k">[Content]</span>
<span class="na">CleanPackageMetadata</span><span class="o">=</span><span class="s">no</span>
<span class="na">Packages</span><span class="o">=</span><span class="s">systemd</span>
<span class="w"> </span><span class="na">udev</span>
</code></pre></div>
<p>We use the <code>directory</code> output format here instead of the <code>disk</code> output
so that we can build our extension without needing root privileges.</p>
<p>Now that we have our base image, we can define a sysext that builds on
top of it by writing the following to <code>mkosi.images/btrfs/mkosi.conf</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">[Config]</span>
<span class="na">Dependencies</span><span class="o">=</span><span class="s">base</span>
<span class="k">[Output]</span>
<span class="na">Format</span><span class="o">=</span><span class="s">sysext</span>
<span class="na">Overlay</span><span class="o">=</span><span class="s">yes</span>
<span class="k">[Content]</span>
<span class="na">BaseTrees</span><span class="o">=</span><span class="s">%O/base</span>
<span class="na">Packages</span><span class="o">=</span><span class="s">btrfs-progs</span>
</code></pre></div>
<p><code>BaseTrees=</code> point to our base image and <code>Overlay=yes</code> instructs mkosi
to only package the files added on top of the base tree.</p>
<p>We can't sign the extension image without a key. We can generate one
by running <code>mkosi genkey</code> which will generate files that are
automatically picked up when building the image.</p>
<p>Finally, you can build the base image and the extensions by running
<code>mkosi -f</code>. You'll find <code>btrfs.raw</code> in <code>mkosi.output</code> which is the
extension image.</p>
<h2>Various other interesting features</h2>
<ul>
<li>To sign any generated UKIs for secure boot, put your secure boot key
and certificate in <code>mkosi.key</code> and <code>mkosi.crt</code> and enable the
<code>SecureBoot=</code> setting. You can also run <code>mkosi genkey</code> to have <code>mkosi</code>
generate a key and certificate itself.</li>
<li>The <code>Ephemeral=</code> setting can be enabled to boot the image in an
ephemeral copy that is thrown away when the container or virtual
machine exits.</li>
<li><code>ShimBootloader=</code> and <code>BiosBootloader=</code> settings are available to
configure shim and grub installation if needed.</li>
<li><code>mkosi</code> can boot directory trees in a virtual using <code>virtiofsd</code>. This
is very useful for quickly rebuilding an image and booting it as the
image does not have to be packed up as a disk image.</li>
<li>...</li>
</ul>
<p>There's many more features that we won't go over in detail here in this
blog post. Learn more about those by reading the
<a href="https://github.com/systemd/mkosi/blob/main/mkosi/resources/mkosi.md">documentation</a>.</p>
<h2>Conclusion</h2>
<p>I'll finish with a bunch of links to more information about <code>mkosi</code> and
related tooling:</p>
<ul>
<li><a href="https://github.com/systemd/mkosi">Github repository</a></li>
<li><a href="https://fedoramagazine.org/create-images-directly-from-rhel-and-rhel-ubi-package-using-mkosi/">Building RHEL and RHEL UBI images with mkosi</a></li>
<li><a href="https://media.ccc.de/v/all-systems-go-2023-191-systemd-repart-building-discoverable-disk-images">My presentation on systemd-repart at ASG 2023</a></li>
<li><a href="https://matrix.to/#/#mkosi:matrix.org">mkosi's Matrix channel</a>.</li>
<li><a href="https://raw.githubusercontent.com/systemd/systemd/main/mkosi.conf">systemd's mkosi configuration</a></li>
<li><a href="https://github.com/systemd/systemd/tree/main/mkosi.conf.d">mkosi's mkosi configuration</a></li>
</ul>2024-01-09T23:00:00+00:00Mike Blumenkrantz: First Bug Down
https://www.supergoodcode.com/first-bug-down/
<h1 id="slow-start">Slow Start</h1>
<p>It’s been a slow start to the year, by which I mean I’ve been buried under an absolute deluge of all the things you can imagine and then also a blizzard. The literal kind, not the kind that used to make great games.</p>
<p>Anyway, it’s not all fun and specs in my capacity as CEO of OpenGL. Sometimes I gotta do Real Work. The number one source of Real Work, as always, is <del>my old code</del> the mesa bug tracker.</p>
<p>Unfortunately, the thing is completely overloaded with NVIDIA bugs right now, so it was slim pickins.</p>
<h1 id="another-game-ive-never-heard-of">Another Game I’ve Never Heard Of</h1>
<p>Am I a boomer? Is this what being a boomer feels like? I really have lived long enough to see myself become the villain.</p>
<p>Next bug up is from this game called <a href="https://store.steampowered.com/app/892970/Valheim/">Valheim</a>. I think it’s a LARPing chess game? Something like that? Don’t @ me.</p>
<p><a href="https://gitlab.freedesktop.org/mesa/mesa/-/issues/10386">This report</a> came in hot over the break with some rad new shading techniques:</p>
<p><a href="https://gitlab.freedesktop.org/mesa/mesa/uploads/549fc90c96a105272133823b090a4ba2/valheim-glitch-4.png"><img alt="hm" src="https://gitlab.freedesktop.org/mesa/mesa/uploads/549fc90c96a105272133823b090a4ba2/valheim-glitch-4.png" /></a></p>
<p>It looks way cooler if you play the trace, but you get the idea.</p>
<h1 id="pinpoint-accuracy">Pinpoint Accuracy</h1>
<p>First question: what in the Sam Hill is going on here?</p>
<p>Apparently <code class="language-plaintext highlighter-rouge">RADV_DEBUG=hang</code> fixes it, which was a curious one since no other env vars affected the issue. This means the problem is somehow caused by an issue related to the actual Vulkan queue submissions, since (according to legendary multipatch chef Samuel “<a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26930">PLZ SEND REVIEWS!!</a>” Pitoiset) this flag synchronizes the queue after every submit.</p>
<p>It’s therefore no surprise that renderdoc was useless. When viewed in isolation, each frame is perfect, but when played at speed the synchronization is lost.</p>
<p>My first stops, as anyone would expect, were the sites of queue submission in zink. This means flush and present.</p>
<p>Now, I know not everyone is going to be comfortable taking this kind of wild, unhinged guess like I did, but stick with me here. The first thing I checked was a breakpoint on <code class="language-plaintext highlighter-rouge">zink_flush()</code>, which is where API flush calls filter through. There were the usual end-of-frame hits, but there were a fair number of calls originating from <a href="https://registry.khronos.org/OpenGL-Refpages/gl4/html/glFenceSync.xhtml">glFenceSync</a>, which is the way a developer can subtly inform a GL driver that they definitely know what they’re doing.</p>
<p>So I saw these calls coming in, and I stepped through <code class="language-plaintext highlighter-rouge">zink_flush()</code>, and I reached <a href="https://gitlab.freedesktop.org/mesa/mesa/-/blob/b06f6e00fba6e33c28a198a1bb14b89e9dfbb4ae/src/gallium/drivers/zink/zink_context.c#L3866">this</a> spot:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">batch</span><span class="o">-></span><span class="n">has_work</span><span class="p">)</span> <span class="p">{</span>
<span class="o"><-----</span><span class="n">HERE</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pfence</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/* reuse last fence */</span>
<span class="n">fence</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-></span><span class="n">last_fence</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">deferred</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">zink_batch_state</span> <span class="o">*</span><span class="n">last</span> <span class="o">=</span> <span class="n">zink_batch_state</span><span class="p">(</span><span class="n">ctx</span><span class="o">-></span><span class="n">last_fence</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">last</span><span class="p">)</span> <span class="p">{</span>
<span class="n">sync_flush</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">last</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">last</span><span class="o">-></span><span class="n">is_device_lost</span><span class="p">)</span>
<span class="n">check_device_lost</span><span class="p">(</span><span class="n">ctx</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ctx</span><span class="o">-></span><span class="n">tc</span> <span class="o">&&</span> <span class="o">!</span><span class="n">ctx</span><span class="o">-></span><span class="n">track_renderpasses</span><span class="p">)</span>
<span class="n">tc_driver_internal_flush_notify</span><span class="p">(</span><span class="n">ctx</span><span class="o">-></span><span class="n">tc</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">fence</span> <span class="o">=</span> <span class="o">&</span><span class="n">batch</span><span class="o">-></span><span class="n">state</span><span class="o">-></span><span class="n">fence</span><span class="p">;</span>
<span class="n">submit_count</span> <span class="o">=</span> <span class="n">batch</span><span class="o">-></span><span class="n">state</span><span class="o">-></span><span class="n">usage</span><span class="p">.</span><span class="n">submit_count</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">deferred</span> <span class="o">&&</span> <span class="o">!</span><span class="p">(</span><span class="n">flags</span> <span class="o">&</span> <span class="n">PIPE_FLUSH_FENCE_FD</span><span class="p">)</span> <span class="o">&&</span> <span class="n">pfence</span><span class="p">)</span>
<span class="n">deferred_fence</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="k">else</span>
<span class="n">flush_batch</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="nb">true</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Now this is a real puzzler, because if you know what you’re doing as a developer, you shouldn’t be reaching this spot. This is the penalty box where I put all the developers who <em>don’t</em> know what they’re doing, the spot where I push up my massive James Webb Space Telescope glasses and say, “No, ackchuahlly you don’t want to flush right now.” Because you only reach this spot if you trigger a flush when there’s nothing to flush.</p>
<p>OR DO YOU?</p>
<p>For hahas, I noped out the first part of that conditional, ensuring that all flushes would translate to queue submits, and magically the bug went away. It was a miracle. Until I tried to think through what must be happening for that to have any effect.</p>
<h1 id="synchronization-you-cannot-escape">Synchronization: You Cannot Escape</h1>
<p>The reason this was especially puzzling is the call sequence was:</p>
<ul>
<li>end-of-frame flush</li>
<li>present</li>
<li>glFenceSync flush</li>
</ul>
<p>which means the last flush was optimized out, instead returning the fence from the end-of-frame flush. And these <em>should</em> be identical in terms of operations the app would want to wait on.</p>
<p>Except that there’s a present in there, and technically that’s a queue submission, and <em>technically</em> something might want to know if the submit for that has completed?</p>
<p>Why yes, that <em>is</em> stupid, but here at SGC, stupidity is our sustenance.</p>
<p>Anyway, I <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26935">blasted out</a> a quick fix, and now you can all go play your favorite chess sim on your favorite driver again.</p>2024-01-08T00:00:00+00:00Mike Blumenkrantz: Manifesto
https://www.supergoodcode.com/manifesto/
<h1 id="this-is-it">This Is It.</h1>
<p>It’s been a long break for the blog, but now we’re back and THE MEME FACTORY IS OPEN FOR BUSINESS.</p>
<p>—is what I’d say if it were any other year. But it’s not any other year. This is 2024, and 2024 is a very special year.</p>
<p>It’s the year a decades-old plan has finally yielded its dividends.</p>
<h1 id="truth">Truth.</h1>
<p>You’ve all heard certain improbable claims before. <em>Big Triangle</em> this. <em>Big Triangle</em> that. Everyone knows who they are. Some have even <a href="https://github.com/zmike/vkoverhead/pull/24#issuecomment-1734067828">accused me</a> of being a shill for Big Triangle from time to time. At last, however, I can finally pull off my mask to reveal the truth for the world.</p>
<p>I was born for a single purpose. As a child, I was grouped in with a number of other candidates. We were trained. Tested. Forged. Unshakable bonds grew between us, bonds we’ll never forget. Bonds that were threatened and broken again and again through harrowing selection processes that culled our ranks.</p>
<p>In time, I was the only one remaining. The only one who survived that brutal gauntlet to fulfill an ultimate goal.</p>
<p>The goal of infiltrating Big Triangle.</p>
<p>More time passed. Days. Months. Years. I continued my quiet training, never letting on to my true purpose.</p>
<p>Now, finally, I’ve achieved the impossible. I’ve attained a status within the ranks of Big Triangle that leaves me in command of vast, unfathomable resources.</p>
<p>I have become an officer.</p>
<p><a href="https://www.supergoodcode.com/assets/itsreal.png"><img alt="itsreal.png" src="https://www.supergoodcode.com/assets/itsreal.png" /></a></p>
<p>I am the chair.</p>
<h1 id="revolution">Revolution.</h1>
<p>Now is the time to rise up, my friends. We must take back the triangles—those big and small, success green and failure red, variable rate shaded and fully shaded, all of them together. We must take them and we must fight. No longer will our goals remain the mere unfulfilled dreams of our basement-dwelling forebearers!</p>
<ul>
<li>
<p><strong>OpenGL 10.0 by 2025!</strong></p>
</li>
<li>
<p><strong>Compatibility Profile shall be renamed ‘SLOW MODE’</strong></p>
</li>
<li>
<p><strong>OpenGL ES shall retroactively convert to a YEAR-MONTH versioning scheme with quarterly releases!</strong></p>
</li>
<li>
<p><strong>Depth values shall be uniformly scaled across all hardware and platforms!</strong></p>
</li>
<li>
<p><strong>XFB shall be outlawed!</strong></p>
</li>
<li>
<p><strong>Linux game ports shall no longer link to LLVM!</strong></p>
</li>
<li>
<p><strong>Coherent API error messages shall be printed!</strong></p>
</li>
<li>
<p><strong>Vendors which cannot ship functional Windows GL drivers shall ship Zink!</strong></p>
</li>
<li>
<p><strong>Native GL drivers on mobile platforms shall be outlawed!</strong></p>
</li>
<li>
<p><strong>gl_PointSize shall be replaced by the constant ‘1.0’ in all cases!</strong></p>
</li>
<li>
<p><strong>Mesh and ray-tracing extensions from NVIDIA shall become core functionality!</strong></p>
</li>
<li>
<p><strong>GLX shall be deleted and forgotten!</strong></p>
</li>
<li>
<p><strong>All bug reports shall contain at least one quality meme in the OP as a form of spam prevention!</strong></p>
</li>
</ul>
<p>Rise up and join me, your new GL/ES chair, in the glorious revolution!</p>
<h1 id="disclaimer">DISCLAIMER</h1>
<p>Obviously this is all a joke (except the part where I’m the 🪑, that’s <a href="https://www.khronos.org/about/working-group-officers/">100% real af</a>), but I still gotta put a disclaimer here because otherwise I’m gonna be in biiiiig trouble if this gets taken seriously.</p>
<p>Happy New Year. I missed you.</p>2024-01-02T00:00:00+00:00Christian Gmeiner: The Year 2023 in Retrospect
https://christian-gmeiner.info/2022-12-26-end-of-year/
<p>Holidays are here and I have time to look back at 2023. For six months I have been working for <a href="https://www.igalia.com/">Igalia</a> and what should I say?</p>
<p>I ❤️ it!</p>
<p>This was the best decision to leave my comfort zone of a normal 9-5 job. I am so proud to work on open source GPU drivers and I am able to spend much of my work time on etnaviv.</p>
<h2 id="driver-maintenance">Driver maintenance</h2>
<p>Before adding any new feature I thought it would be great idea to improve the current state of etnaviv’s gallium driver. Therefor I reworked some general driver <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=ae828a33a74c5b3fc6abee481eac7cb57bf815d0">code</a> to be more consistent and to have a more modern feeling, and made it possible to drop some <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=e13bdbbd5bfc1cef00cf504b0567238ae8f45524">hand-rolled</a> conversion helpers by switching to already existing solutions (<code>U_FIXED(..)</code>, <code>S_FIXED(..)</code>, <code>float_to_ubyte(..)</code>).</p>
<p>I worked through the low hanging fruits of crashes seen in CI runs and <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=add14d6cfb6b2aa666c7dbe2bbe43a8926d62d34">fixed</a> <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=a11501e014c82a51e606df079cc0dec2538fd860">many</a> of <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=9342544ca5c9ec2d7c100fe80f3cb6ac41547231">them</a>.</p>
<p>Feature wise, I also looked at some easy to implement extensions like <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=62e0f6bf328e37f3c4704ca35427c3dde0744977">GL_NV_conditional_render</a> and <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=dadb7244bb3df10b1418146b5a5c1cffa8364973">GL_OES_texture_half_float_linear</a>.</p>
<p>Besides the gallium driver I also worked on <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=fb48d3d1da0ab493fbd22f62dd85a9ab0c0811a0">some</a> <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=f831883af6389097624d0f9d8b067eb59b2c4780">NIR</a> and <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=2c9a59dcfc1fc5674a590f6d157f76ce57bd9cac">isaspec</a> <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=b2e4972339711a9576ec309ecdd4f42eb664c2f9">features</a> that are <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=fa0ff0849c5d96534195d276658aa8211d115076">beneficial</a> for etnaviv.</p>
<h2 id="xdc2023">XDC2023</h2>
<p>A personal highlight was to give a talk about etnaviv at XDC2023 <strong>in person</strong>.</p>
<div style="padding-bottom: 56.25%; height: 0; overflow: hidden;">
</div>
<p>You might wonder what happened since mid October in etnaviv land.</p>
<h2 id="gles3">GLES3</h2>
<p>I worked on some features that are needed to expose GLES3 and it turned out that an easy to maintain, extend and test compiler backend is needed. Sadly etnaviv’s current backend compiler does not check any of these boxes. It is so fragile that I only added some needed <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=5a952807487255cb8e3be6bc2eb66041f7f7785b">lowerings</a> to pass some of the <code>dEQP-GLES3.functional.shaders.texture_functions.*</code> tests.</p>
<p>Some more fun work regarding some feature emulation is on the horizon and it’s blocked again by the current compiler.</p>
<h2 id="backend-compiler">Backend Compiler</h2>
<p>etnaviv includes an <a href="https://docs.mesa3d.org/isaspec.html">isaspec</a> powered <a href="https://cgit.freedesktop.org/mesa/mesa/commit/?id=64caf906328dad0491a07898cf4b6382f4baab35">disassembler</a> now - a small step towards a new backend compiler. Next on the road to success is the etnaviv backend IR with an assembler.</p>
<p>The new backend compiler is able to run OpenCL kernels with the help of rusticl but I want to land the new backend compiler in smaller chunks that are easier to review.</p>
<h2 id="multiple-render-targets">Multiple Render Targets</h2>
<p>During my XDC presentation I talked about a feature I got working on GC7000L - Multiple Render Targets (MRT). At this point it was more or less a proof-of-concept regarding the gallium drivers. There were some missing bits and register for full support on more GPU models and therefore more reverse engineering work was needed. Also the gallium driver needed lots of work to add support for MRT.</p>
<p>Some weeks later I had MRT working on a wider range of Vivante GPUs that are supporting this feature. This includes GC2000, GC3000 and GC7000 models among others. As etnaviv makes heavy use of GPU features it should work on even more models.</p>
<h2 id="looking-forward-to-2024">Looking forward to 2024</h2>
<p>I am really confident that we will see GLES3 and OpenCL for etnaviv. As driver testing is quite important for my work I will expand my current board farm and will look into the new star in CI world - <a href="https://gfx-ci.pages.freedesktop.org/ci-tron/">ci-tron</a>.</p>
<p>With that, have a happy holiday season and we’ll be back with more improvements in 2024!</p>2023-12-26T17:03:20+00:00Ricardo Garcia: Vulkan extensions Igalia helped ship in 2023
https://rg3.name/202312221450.html
<div class="paragraph">
<p>Last year I wrote a <a href="https://rg3.name/202212122137.html">recap of the Vulkan extensions Igalia helped ship in 2022</a>, and in this post I’ll do the exact same for 2023.</p>
</div>
<div class="imageblock">
<div class="content">
<img alt="Igalia Logo next to the Vulkan Logo" src="https://rg3.name/img/igalia-and-vulkan.png" />
</div>
</div>
<div class="paragraph">
<p>For context and quoting the previous recap:</p>
</div>
<div class="quoteblock">
<blockquote>
<div class="paragraph">
<p>The ongoing collaboration between <a href="https://www.valvesoftware.com/">Valve</a> and <a href="https://www.igalia.com/">Igalia</a> lets me and some of my colleagues work on improving the open-source <a href="https://github.com/KhronosGroup/VK-GL-CTS">Vulkan and OpenGL Conformance Test Suite</a>.
This work is essential to ship quality Vulkan drivers and, from the Khronos side, to improve the Vulkan standard further by, among other things, adding new functionality through API extensions.
When creating a new extension, apart from reaching consensus among vendors about the scope and shape of the new APIs, CTS tests are developed in order to check the specification text is clear and vendors provide a uniform implementation of the basic functionality, corner cases and, sometimes, interactions with other extensions.</p>
</div>
<div class="paragraph">
<p>In addition to our CTS work, many times we review the <a href="https://github.com/KhronosGroup/Vulkan-Docs/">Vulkan specification text</a> from those extensions we develop tests for. We also do the same for other extensions and changes, and we also submit fixes and improvements of our own.</p>
</div>
</blockquote>
</div>
<div class="paragraph">
<p>So, without further ado, this is the list of extensions we helped ship in 2023.</p>
</div>
<div class="sect3">
<h4 id="_vk_ext_attachment_feedback_loop_dynamic_state"><a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_attachment_feedback_loop_dynamic_state.html">VK_EXT_attachment_feedback_loop_dynamic_state</a></h4>
<div class="paragraph">
<p>This extension builds on last year’s <a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_attachment_feedback_loop_layout.html">VK_EXT_attachment_feedback_loop_layout</a>, which is used by <a href="https://github.com/doitsujin/dxvk/">DXVK</a> 2.0+ to more efficiently support D3D9 games that read from active render targets.
The new extension shipped this year adds support for setting attachment feedback loops dynamically on command buffers.
As all extensions that add more dynamic state, the goal here is to reduce the number of pipeline objects applications need to create, which makes using the API more flexible.
It was created by our beloved super good coder and Valve contractor <a href="https://www.supergoodcode.com/">Mike Blumenkrantz</a>.
We reviewed the spec and are listed as contributors, and we wrote dynamic variants of the existing CTS tests.</p>
</div>
</div>
<div class="sect3">
<h4 id="_vk_ext_depth_bias_control"><a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_depth_bias_control.html">VK_EXT_depth_bias_control</a></h4>
<div class="paragraph">
<p>A new extension proposed by <a href="https://github.com/Joshua-Ashton">Joshua Ashton</a> that also helps with layering D3D9 on top of Vulkan.
The original problem is quite specific.
In D3D9 and other APIs, applications can specify what is called a “depth bias” for geometry using an offset that is to be added directly as an exact value to the original depth of each fragment.
In Vulkan, however, the depth bias is expressed as a factor of “r”, where “r” is a number that depends on the depth buffer format and, furthermore, may not have a specific fixed value.
Implementations can use different values of “r” in an acceptable range.
The mechanism provided by Vulkan without this extension is useful to apply small offsets and solve some problems, but it’s not useful to apply large offsets and/or emulate D3D9 by applying a fixed-value bias.
The new extension solves these problems by giving apps the chance to control depth bias in a precise way.
We reviewed the spec and are listed as contributors, and wrote CTS tests for this extension to help ship it.</p>
</div>
</div>
<div class="sect3">
<h4 id="_vk_ext_dynamic_rendering_unused_attachments"><a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_dynamic_rendering_unused_attachments.html">VK_EXT_dynamic_rendering_unused_attachments</a></h4>
<div class="paragraph">
<p>This extension was proposed by Piers Daniell from NVIDIA to lift some restrictions in the original <a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_dynamic_rendering.html">VK_KHR_dynamic_rendering</a> extension, which is used in Vulkan to avoid having to create render passes and framebuffer objects.
Dynamic rendering is very interesting because it makes the API much easier to use and, in many cases and specially in desktop platforms, it can be shipped without any associated performance loss.
The new extension relaxes some restrictions that made pipelines more tightly coupled with render pass instances.
Again, the goal here is to be able to reuse the same pipeline object with multiple render pass instances and remove some combinatorial explosions that may occur in some apps.
We reviewed the spec and are listed as contributors, and wrote CTS tests for the new extension.</p>
</div>
</div>
<div class="sect3">
<h4 id="_vk_ext_image_sliced_view_of_3d"><a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_image_sliced_view_of_3d.html">VK_EXT_image_sliced_view_of_3d</a></h4>
<div class="paragraph">
<p>Shipped at the beginning of the year by Mike Blumenkrantz, the extension again helps emulating other APIs on top of Vulkan.
Specifically, the extension allows creating 3D views of 3D images such that the views contain a subset of the slices in the image, using a Z offset and range, in the same way D3D12 allows.
We reviewed the spec, we’re listed as contributors, and we wrote CTS tests for it.</p>
</div>
</div>
<div class="sect3">
<h4 id="_vk_ext_pipeline_library_group_handles"><a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_pipeline_library_group_handles.html">VK_EXT_pipeline_library_group_handles</a></h4>
<div class="paragraph">
<p>This one comes from Valve contractor <a href="https://github.com/HansKristian-Work">Hans-Kristian Arntzen</a>, who is mostly known for working on Proton projects like <a href="https://github.com/HansKristian-Work/vkd3d-proton">VKD3D-Proton</a>.
The extension is related to ray tracing and adds more flexibility when creating ray tracing pipelines.
Ray tracing pipelines can hold thousands of different shaders and are sometimes built incrementally by combining so-called pipeline libraries that contain subsets of those shaders.
However, to properly use those pipelines we need to create a structure called a shader binding table, which is full of shader group handles that have to be retrieved from pipelines.
Prior to this extension, shader group handles from pipeline libraries had to be requeried once the final pipeline is linked, as they were not guaranteed to be constant throughout the whole process.
With this extension, an implementation can tell apps they will not modify shader group handles in subsequent link steps, which makes it easier for apps to build shader binding tables.
More importantly, this also more closely matches functionality in DXR 1.1, making it easier to emulate DirectX Raytracing on top of Vulkan raytracing.
We reviewed the spec, we’re listed as contributors and we wrote CTS tests for it.</p>
</div>
</div>
<div class="sect3">
<h4 id="_vk_ext_shader_object"><a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_EXT_shader_object.html">VK_EXT_shader_object</a></h4>
<div class="paragraph">
<p>Shader objects is probably the most notorious extension shipped this year, and we contributed small bits to it.
This extension makes every piece of state dynamic and removes the need to use pipelines.
It’s always used in combination with dynamic rendering, which also removes render passes and framebuffers as explained above.
This results in great flexibility from the application point of view.
The extension was created by Daniel Story from Nintendo, and its vast set of CTS tests was created by <a href="https://github.com/ziga-lunarg">Žiga Markuš</a> but we added our grain of sand by reviewing the spec and proposing some changes (which is why we’re listed as contributors), as well as fixing some shader object tests and providing some improvements here and there once they had been merged.
A good part of this work was done in coordination with Mesa developers which were working on implementing this extension for different drivers.</p>
</div>
</div>
<div class="sect3">
<h4 id="_vk_khr_video_encode_h264_and_vk_khr_video_encode_h265"><a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_video_encode_h264.html">VK_KHR_video_encode_h264</a> and <a href="https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_video_encode_h265.html">VK_KHR_video_encode_h265</a></h4>
<div class="paragraph">
<p>Fresh out of the oven, these Vulkan Video extensions allow leveraging the hardware to efficiently encode H.264 and H.265 streams.
This year we’ve been doing a ton of work related to Vulkan Video in <a href="https://gitlab.freedesktop.org/zzoon/mesa/-/tree/h264enc_anv_4?ref_type=heads">drivers</a>, <a href="https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5739">libraries</a> like <a href="https://blogs.igalia.com/scerveau/vulkan-video-encoder-in-gstreamer/">GStreamer</a> and <a href="https://www.youtube.com/watch?v=3kw_maj-v6g">CTS/spec</a>, including the two extensions mentioned above.
Although not listed as contributors to the spec in those two Vulkan extensions, our work played a major role in advancing the state of Vulkan Video and getting them shipped.</p>
</div>
</div>
<div class="sect3">
<h4 id="_epilogue">Epilogue</h4>
<div class="paragraph">
<p>That’s it for this year!
I’m looking forward to help ship more extension work the next one and trying to add my part in making Vulkan drivers on Linux (and other platforms!) more stable and feature rich.
My Vulkan Video colleagues at Igalia have already started work on future Vulkan Video extensions for AV1 and VP9.
Hopefully some of that work is ratified next year.
Fingers crossed!</p>
</div>
</div>2023-12-22T14:50:00+00:00Tomeu Vizoso: Etnaviv NPU update 13: Don't cross the tensors
https://blog.tomeuvizoso.net/2023/12/etnaviv-npu-update-13-dont-cross-tensors.html
<p></p><p></p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgq1wKENtMzx01kGsnLXjmoCFGpyA67hSvWs1nAWXBftImNiTWD2dnfWaRWqhROBRcygMum9WfqZFp01ijApbVuwPWbXte4ds5pv2M_GyIcya_Ma0ZJJjoZIwrBk07X60PB7mB2Dp2r0NVtURa81yOHaOMNfS9Sr9avrF92NUfegfcqg5DiU7XAfAHUixQ/s389/1520238648692.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgq1wKENtMzx01kGsnLXjmoCFGpyA67hSvWs1nAWXBftImNiTWD2dnfWaRWqhROBRcygMum9WfqZFp01ijApbVuwPWbXte4ds5pv2M_GyIcya_Ma0ZJJjoZIwrBk07X60PB7mB2Dp2r0NVtURa81yOHaOMNfS9Sr9avrF92NUfegfcqg5DiU7XAfAHUixQ/s16000/1520238648692.jpg" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><span class="ILfuVd" lang="en"><span class="hgKElc"><i>"Don't cross the streams. It would be bad."</i></span></span></td></tr></tbody></table><h4 style="text-align: left;">IR refactorings <br /></h4><p>A big part of what I have been up to in the past two weeks has been a
serious refactoring of the data structures that hold the model data in
the different phases until the HW configurations is generated.</p><p>What we had was enough for models with trivial control flow such as MobileNetV1, but more recent models for object classification and detection make use of more operations and those are linked between each other non-sequentially.</p><p>The image below shows six of the more than a hundred operations in the SSDLite MobileDet model:<br /></p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8uT4oTOPviR6_aqbR0KFWycEcCxHBFoptasiS8nfb_2aiJ0XKNBE7BIVjFNBA46LPV204yMIBjrzPkJT_WyWc5k3HUcLLzzAMD9-NWei85UbmKHTgxHTHje8vEIdxQTfAEP9nk7HCWJEtgxpXU3CsrY1xykjiSa9QI35In5amVjFu7OGl8BmUA_j_oQQ/s888/mobiledet_add.png" style="margin-left: auto; margin-right: auto;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8uT4oTOPviR6_aqbR0KFWycEcCxHBFoptasiS8nfb_2aiJ0XKNBE7BIVjFNBA46LPV204yMIBjrzPkJT_WyWc5k3HUcLLzzAMD9-NWei85UbmKHTgxHTHje8vEIdxQTfAEP9nk7HCWJEtgxpXU3CsrY1xykjiSa9QI35In5amVjFu7OGl8BmUA_j_oQQ/w210-h640/mobiledet_add.png" width="210" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">A small subsection of SSDLite MobileDet</td></tr></tbody></table><p>The adds will be "lowered" or converted to a special case of convolution in which the two input tensors are concatenated together as two channels of a single tensor, and the last convolution in the fragment will need to have its input tensor processed to remove the stride as the HW doesn't support those natively. The processing of this tensor will be performed in an additional job that will run in the TP (tensor processing) cores in the NPU.</p><p>As you can probably imagine, the modifications to the operation graph will be far from trivial without the right data structures, so I looked at ways of refactoring the code that translates the model as given by TensorFlow Lite to the HW operations.</p><p>For now I have settled into having a separate data structure for the tensors, and having the operations refer to its input and output tensors from the indices in that list. In the future, I think we should move to intermediate representations more akin to what is used in compilers, to support more complex lowerings of operations and reorganizations of the operations inside the model.</p><p>I will be thinking about this later next year, once I get object detection with SSDLite MobileDet running at a useful performance level. Ideally I would like to reuse NIR so drivers can do all the lowerings and optimizations they need without having to reinvent so much of a IR, but if it turns out that operations on tensors aren't a good fit for NIR, then I will be thinking of doing something similar just for it.</p><p>For NPUs with programmable cores it could be very interesting to have a pipeline of transformations that can go from very high level operations to GPGPU instructions, probably starting from a standard such as MLIR.</p><h4 style="text-align: left;">Tensor addition</h4><p>Also put some time in putting together all the information I gathered about how the proprietary driver interacts with the HW when submitting tensor addition jobs, and spent a substantial amount of time looking at the different parameter combinations in a spreadsheet, with liberal use of CORREL() to get a hint of what parameters of the high-level operations are used as inputs in the formulas that produce the HW configuration.</p><h4 style="text-align: left;">Lowering the strides</h4><p>Similarly to the above, there was a lot of staring to a spreadsheet for the parameters of the TP jobs that transform the input tensor of a convolution with stride different than one.</p><h4 style="text-align: left;">Status and next steps <br /></h4><p>Below is a rendering of the whole operation graph for the SSDLite MobileDet model, so people can get an idea of the dimensions and complexity of a modern model for edge object detection.</p><p>The model is currently running without anything exploding too badly, and all the convolutions are running correctly when run independently. But when run together, I see some bad results starting to flow around the middle of the graph, so that is what I will be debugging next.<br /></p><p></p><p></p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxIxl-0oWNOqrRirUSUkf7k5b_pYiudHW1aOxIdF5K2MULi1zPldgxEfr2lNi5aZQqfUJ7KpmHFLl6KpWpCC0wbfxDi47I4hswY-p-gfDLsoA68OZfD_9YjxyHqa1maSHXHL9WRKrVik_5haHpLUeRrPwJyeiBwkqAt7iyQxdd7nVrjQYhb-4Z0esauK0/s21360/ssdlite_mobiledet_coco_qat_postprocess.png" style="margin-left: auto; margin-right: auto;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxIxl-0oWNOqrRirUSUkf7k5b_pYiudHW1aOxIdF5K2MULi1zPldgxEfr2lNi5aZQqfUJ7KpmHFLl6KpWpCC0wbfxDi47I4hswY-p-gfDLsoA68OZfD_9YjxyHqa1maSHXHL9WRKrVik_5haHpLUeRrPwJyeiBwkqAt7iyQxdd7nVrjQYhb-4Z0esauK0/w102-h640/ssdlite_mobiledet_coco_qat_postprocess.png" width="102" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">The whole of SSDLite MobileDet<br /></td></tr></tbody></table><br /> <p></p>2023-12-21T08:16:00+00:00Melissa Wen: The Rainbow Treasure Map Talk: Advanced color management on Linux with AMD/Steam Deck.
https://melissawen.github.io/blog/2023/12/20/xdc2023-colors-talk
<p>Last week marked a major milestone for me:
the <a href="https://lore.kernel.org/amd-gfx/20231116195812.906115-1-mwen@igalia.com/">AMD driver-specific color management properties</a>
reached the upstream <a href="https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/drivers/gpu/drm/amd/display?id=9342a9ae54ef299ffe5e4ce3d0be6a4da5edba0e">linux-next</a>!</p>
<p>And to celebrate, I’m happy to share the
<a href="https://indico.freedesktop.org/event/4/contributions/186/attachments/138/218/xdc2023-TheRainbowTreasureMap-MelissaWen.pdf">slides</a>
notes from my 2023 XDC talk, “The Rainbow Treasure Map” along with the
<a href="https://www.youtube.com/embed/voI0HxhFzbI">individual recording</a> that just
dropped last week on youtube – talk about happy coincidences!</p>
<h2 id="steam-deck-rainbow-treasure-map--magic-frogs">Steam Deck Rainbow: Treasure Map & Magic Frogs</h2>
<p>While I may be bubbly and chatty in everyday life, the stage isn’t exactly my
comfort zone (hallway talks are more my speed). But the journey of developing
the AMD color management properties was so full of discoveries that I simply
had to share the experience. Witnessing the fantastic work of Jeremy and Joshua
bring it all to life on the Steam Deck OLED was like uncovering magical
ingredients and whipping up something truly enchanting.</p>
<p>For XDC 2023, we split our Rainbow journey into two talks. My focus, “The
Rainbow Treasure Map,” explored the new color features we added to the Linux
kernel driver, diving deep into the hardware capabilities of AMD/Steam Deck.
Joshua then followed with “The Rainbow Frogs” and showed the breathtaking color
magic released on Gamescope thanks to the power unlocked by the kernel driver’s
Steam Deck color properties.</p>
<h2 id="packing-a-rainbow-into-15-minutes">Packing a Rainbow into 15 Minutes</h2>
<p>I had so much to tell, but a half-slot talk meant crafting a concise
presentation. To squeeze everything into 15 minutes (and calm my pre-talk
jitters a bit!), I drafted and practiced those slides and notes countless
times.</p>
<p>So grab your map, and let’s embark on the Rainbow journey together!</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-1.png"><img alt="Slide 1: The Rainbow Treasure Map - Advanced Color Management on Linux with AMD/SteamDeck" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-1.png" style="display: inline;" width="750" /></a></p>
<p>Intro: Hi, I’m Melissa from Igalia and welcome to the Rainbow Treasure Map, a
talk about advanced color management on Linux with AMD/SteamDeck.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-2.png"><img alt="Slide 2: List useful links for this technical talk" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-2.png" style="display: inline;" width="750" /></a></p>
<p>Useful links: First of all, if you are not used to the topic, you may find
these links useful.</p>
<ol>
<li><a href="https://www.youtube.com/watch?v=CMm-yhsMB7U">XDC 2022 - I’m not an AMD expert, but… - Melissa Wen</a></li>
<li><a href="https://www.youtube.com/watch?v=nDnbWaIMJJA">XDC 2022 - Is HDR Harder? - Harry Wentland</a></li>
<li><a href="https://www.youtube.com/watch?v=BFNkoNnzYAA">XDC 2022 Lightning - HDR Workshop Summary - Harry Wentland</a></li>
<li><a href="https://gitlab.freedesktop.org/pq/color-and-hdr#color-management-and-hdr-documentation-for-foss-graphics">Color management and HDR documentation for FOSS graphics - Pekka Paalanen et al.</a></li>
<li><a href="https://github.com/jeremyselan/cinematiccolor/blob/master/siggraph2012/cinematic_color.pdf">Cinematic Color - 2012 SIGGRAPH course notes - Jeremy Selan</a></li>
<li><a href="https://melissawen.github.io/blog/2023/08/21/amd-steamdeck-colors">AMD Driver-specific Properties for Color Management on Linux (Part 1) - Melissa Wen</a></li>
</ol>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-3.png"><img alt="Slide 3: Why do we need advanced color management on Linux?" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-3.png" style="display: inline;" width="750" /></a></p>
<p>Context: When we talk about colors in the graphics chain, we should keep in
mind that we have a wide variety of source content colorimetry, a variety of
output display devices and also the internal processing. Users expect
consistent color reproduction across all these devices.</p>
<p>The userspace can use GPU-accelerated color management to get it. But this also
requires an interface with display kernel drivers that is currently missing
from the DRM/KMS framework.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-4.png"><img alt="Slide 4: Describe our work on AMD driver-specific color properties" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-4.png" style="display: inline;" width="750" /></a></p>
<p>Since April, I’ve been bothering the DRM community by sending patchsets from
the work of me and Joshua to add driver-specific color properties to the AMD
display driver. In parallel, discussions on defining a generic color management
interface are still ongoing in the community. Moreover, we are still not clear
about the diversity of color capabilities among hardware vendors.</p>
<p>To bridge this gap, we defined a color pipeline for Gamescope that fits the
latest versions of AMD hardware. It delivers advanced color management features
for gamut mapping, HDR rendering, SDR on HDR, and HDR on SDR.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-5.png"><img alt="Slide 5: Describe the AMD/SteamDeck - our hardware" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-5.png" style="display: inline;" width="750" /></a></p>
<p>AMD/Steam Deck hardware: AMD frequently releases new GPU and APU generations.
Each generation comes with a DCN version with display hardware improvements.
Therefore, keep in mind that this work uses the AMD Steam Deck hardware and its
kernel driver. The Steam Deck is an APU with a DCN3.01 display driver, a DCN3
family.</p>
<p>It’s important to have this information since newer AMD DCN drivers inherit
implementations from previous families but aldo each generation of AMD hardware
may introduce new color capabilities. Therefore I recommend you to familiarize
yourself with the hardware you are working on.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-6.png"><img alt="Slide 6: Diagram with the three layers of the AMD display driver on Linux" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-6.png" style="display: inline;" width="750" /></a></p>
<p>The AMD display driver in the kernel space: It consists of three layers, (1)
the DRM/KMS framework, (2) the AMD Display Manager, and (3) the AMD Display
Core. We extended the color interface exposed to userspace by leveraging
existing DRM resources and connecting them using driver-specific functions for
color property management.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-7.png"><img alt="Slide 7: Three-layers diagram highlighting AMD Display Manager, DM - the layer that connects DC and DRM" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-7.png" style="display: inline;" width="750" /></a></p>
<p>Bridging DC color capabilities and the DRM API required significant changes in
the color management of AMD Display Manager - the Linux-dependent part that
connects the AMD DC interface to the DRM/KMS framework.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-8.png"><img alt="Slide 8: Three-layers diagram highlighting AMD Display Core, DC - the shared code" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-8.png" style="display: inline;" width="750" /></a></p>
<p>The AMD DC is the OS-agnostic layer. Its code is shared between platforms and
DCN versions. Examining this part helps us understand the AMD color pipeline
and hardware capabilities, since the machinery for hardware settings and
resource management are already there.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-9.png"><img alt="Slide 9: Diagram of the AMD Display Core Next architecture with main elements and data flow" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-9.png" style="display: inline;" width="750" /></a></p>
<p>The newest architecture for AMD display hardware is the AMD Display Core Next.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-10.png"><img alt="Slide 10: Diagram of the AMD Display Core Next where only DPP and MPC blocks are highlighted" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-10.png" style="display: inline;" width="750" /></a></p>
<p>In this architecture, two blocks have the capability to manage colors:</p>
<ul>
<li>Display Pipe and Plane (DPP) - for pre-blending adjustments;</li>
<li>Multiple Pipe/Plane Combined (MPC) - for post-blending color transformations.</li>
</ul>
<p>Let’s see what we have in the DRM API for pre-blending color management.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-11.png"><img alt="Slide 11: Blank slide with no content only a title 'Pre-blending: DRM plane'" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-11.png" style="display: inline;" width="750" /></a></p>
<p>DRM plane color properties:</p>
<p>This is the DRM color management API before blending.</p>
<p>Nothing!</p>
<p>Except two basic DRM plane properties: <code class="language-plaintext highlighter-rouge">color_encoding</code> and <code class="language-plaintext highlighter-rouge">color_range</code> for
the input colorspace conversion, that is not covered by this work.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-12.png"><img alt="Slide 12: Diagram with color capabilities and structures in AMD DC layer without any DRM plane color interface (before blending), only the DRM CRTC color interface for post blending" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-12.png" style="display: inline;" width="750" /></a></p>
<p>In case you’re not familiar with AMD shared code, what we need to do is
basically draw a map and navigate there!</p>
<p>We have some DRM color properties after blending, but nothing before blending
yet. But much of the hardware programming was already implemented in the AMD DC
layer, thanks to the shared code.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-13.png"><img alt="Slide 13: Previous Diagram with a rectangle to highlight the empty space in the DRM plane interface that will be filled by AMD plane properties" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-13.png" style="display: inline;" width="750" /></a></p>
<p>Still both the DRM interface and its connection to the shared code were
missing. That’s when the search begins!</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-14.png"><img alt="Slide 14: Color Pipeline Diagram with the plane color interface filled by AMD plane properties but without connections to AMD DC resources" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-14.png" style="display: inline;" width="750" /></a></p>
<p>AMD driver-specific color pipeline:</p>
<p>Looking at the color capabilities of the hardware, we arrive at this initial
set of properties. The path wasn’t exactly like that. We had many iterations
and discoveries until reached to this pipeline.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-15.png"><img alt="Slide 15: Color Pipeline Diagram connecting AMD plane degamma properties, LUT and TF, to AMD DC resources" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-15.png" style="display: inline;" width="750" /></a></p>
<p>The Plane Degamma is our first driver-specific property before blending. It’s
used to linearize the color space from encoded values to light linear values.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-16.png"><img alt="Slide 16: Describe plane degamma properties and hardware capabilities" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-16.png" style="display: inline;" width="750" /></a></p>
<p>We can use a pre-defined transfer function or a user lookup table (in short,
LUT) to linearize the color space.</p>
<p>Pre-defined transfer functions for plane degamma are hardcoded curves that go
to a specific hardware block called DPP Degamma ROM. It supports the following
transfer functions: sRGB EOTF, BT.709 inverse OETF, PQ EOTF, and pure power
curves Gamma 2.2, Gamma 2.4 and Gamma 2.6.</p>
<p>We also have a one-dimensional LUT. This 1D LUT has four thousand ninety six
(4096) entries, the usual 1D LUT size in the DRM/KMS. It’s an array of
<code class="language-plaintext highlighter-rouge">drm_color_lut</code> that goes to the DPP Gamma Correction block.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-17.png"><img alt="Slide 17: Color Pipeline Diagram connecting AMD plane CTM property to AMD DC resources" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-17.png" style="display: inline;" width="750" /></a></p>
<p>We also have now a color transformation matrix (CTM) for color space
conversion.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-18.png"><img alt="Slide 18: Describe plane CTM property and hardware capabilities" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-18.png" style="display: inline;" width="750" /></a></p>
<p>It’s a 3x4 matrix of fixed points that goes to the DPP Gamut Remap Block.</p>
<p>Both pre- and post-blending matrices were previously gone to the same color
block. We worked on detaching them to clear both paths.</p>
<p>Now each CTM goes on its own way.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-19.png"><img alt="Slide 19: Color Pipeline Diagram connecting AMD plane HDR multiplier property to AMD DC resources" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-19.png" style="display: inline;" width="750" /></a></p>
<p>Next, the HDR Multiplier. HDR Multiplier is a factor applied to the color
values of an image to increase their overall brightness.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-20.png"><img alt="Slide 20: Describe plane HDR mult property and hardware capabilities" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-20.png" style="display: inline;" width="750" /></a></p>
<p>This is useful for converting images from a standard dynamic range (SDR) to a
high dynamic range (HDR). As it can range beyond [0.0, 1.0] subsequent
transforms need to use the PQ(HDR) transfer functions.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-21.png"><img alt="Slide 21: Color Pipeline Diagram connecting AMD plane shaper properties, LUT and TF, to AMD DC resources" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-21.png" style="display: inline;" width="750" /></a></p>
<p>And we need a 3D LUT. But 3D LUT has a limited number of entries in each
dimension, so we want to use it in a colorspace that is optimized for human
vision. It means in a non-linear space. To deliver it, userspace may need one
1D LUT before 3D LUT to delinearize content and another one after to linearize
content again for blending.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-22.png"><img alt="Slide 22: Describe plane shaper properties and hardware capabilities" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-22.png" style="display: inline;" width="750" /></a></p>
<p>The pre-3D-LUT curve is called Shaper curve. Unlike Degamma TF, there are no
hardcoded curves for shaper TF, but we can use the AMD color module in the
driver to build the following shaper curves from pre-defined coefficients. The
color module combines the TF and the user LUT values into the LUT that goes to
the DPP Shaper RAM block.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-23.png"><img alt="Slide 23: Color Pipeline Diagram connecting AMD plane 3D LUT property to AMD DC resources" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-23.png" style="display: inline;" width="750" /></a></p>
<p>Finally, our rockstar, the 3D LUT. 3D LUT is perfect for complex color
transformations and adjustments between color channels.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-24.png"><img alt="Slide 24: Describe plane 3D LUT property and hardware capabilities" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-24.png" style="display: inline;" width="750" /></a></p>
<p>3D LUT is also more complex to manage and requires more computational
resources, as a consequence, its number of entries is usually limited. To
overcome this restriction, the array contains samples from the approximated
function and values between samples are estimated by tetrahedral interpolation.
AMD supports 17 and 9 as the size of a single-dimension. Blue is the outermost
dimension, red the innermost.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-25.png"><img alt="Slide 25: Color Pipeline Diagram connecting AMD plane blend properties, LUT and TF, to AMD DC resources" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-25.png" style="display: inline;" width="750" /></a></p>
<p>As mentioned, we need a post-3D-LUT curve to linearize the color space before
blending. This is done by Blend TF and LUT.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-26.png"><img alt="Slide 26: Describe plane blend properties and hardware capabilities" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-26.png" style="display: inline;" width="750" /></a></p>
<p>Similar to shaper TF, there are no hardcoded curves for Blend TF. The
pre-defined curves are the same as the Degamma block, but calculated by the
color module. The resulting LUT goes to the DPP Blend RAM block.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-27.png"><img alt="Slide 27: Color Pipeline Diagram with all AMD plane color properties connect to AMD DC resources and links showing the conflict between plane and CRTC degamma" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-27.png" style="display: inline;" width="750" /></a></p>
<p>Now we have everything connected before blending. As a conflict between plane
and CRTC Degamma was inevitable, our approach doesn’t accept that both are set
at the same time.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-28.png"><img alt="Slide 28: Color Pipeline Diagram connecting AMD CRTC gamma TF property to AMD DC resources" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-28.png" style="display: inline;" width="750" /></a></p>
<p>We also optimized the conversion of the framebuffer to wire encoding by adding
support to pre-defined CRTC Gamma TF.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-29.png"><img alt="Slide 29: Describe CRTC gamma TF property and hardware capabilities" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-29.png" style="display: inline;" width="750" /></a></p>
<p>Again, there are no hardcoded curves and TF and LUT are combined by the AMD
color module. The same types of shaper curves are supported. The resulting LUT
goes to the MPC Gamma RAM block.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-30.png"><img alt="Slide 30: Color Pipeline Diagram with all AMD driver-specific color properties connect to AMD DC resources" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-30.png" style="display: inline;" width="750" /></a></p>
<p>Finally, we arrived in the final version of DRM/AMD driver-specific color
management pipeline. With this knowledge, you’re ready to better enjoy the
rainbow treasure of AMD display hardware and the world of graphics computing.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-31.png"><img alt="Slide 31: SteamDeck/Gamescope Color Pipeline Diagram with rectangles labeling each block of the pipeline with the related AMD color property" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-31.png" style="display: inline;" width="750" /></a></p>
<p>With this work, Gamescope/Steam Deck embraces the color capabilities of the AMD
GPU. We highlight here how we map the Gamescope color pipeline to each AMD
color block.</p>
<p><a href="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-32.png"><img alt="Slide 32: Final slide. Thank you!" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/xdc-2023-colors-talk/rainbow-treasure-xdc-2023-32.png" style="display: inline;" width="750" /></a></p>
<p>Future works:
The search for the rainbow treasure is not over! The Linux DRM subsystem
contains many hidden treasures from different vendors. We want more complex
color transformations and adjustments available on Linux. We also want to
expose all GPU color capabilities from all hardware vendors to the Linux
userspace.</p>
<p>Thanks Joshua and Harry for this joint work and the Linux DRI community for all feedback and reviews.</p>
<p>The amazing part of this work comes in the next talk with Joshua and The Rainbow Frogs!</p>
<p>Any questions?</p>
<hr />
<p>References:</p>
<ol>
<li><a href="https://indico.freedesktop.org/event/4/contributions/186/attachments/138/218/xdc2023-TheRainbowTreasureMap-MelissaWen.pdf">Slides of the talk The Rainbow Treasure Map</a>.</li>
<li><a href="https://www.youtube.com/embed/voI0HxhFzbI">Youtube video of the talk The Rainbow Treasure Map</a>.</li>
<li><a href="https://lore.kernel.org/amd-gfx/20231116195812.906115-1-mwen@igalia.com/">Patch series for AMD driver-specific color management properties</a> (upstream Linux 6.8v).</li>
<li><a href="https://github.com/ValveSoftware/gamescope/blob/master/src/docs/Steam%20Deck%20Display%20Pipeline.png">SteamDeck/Gamescope color management pipeline</a></li>
<li><a href="https://indico.freedesktop.org/event/4/page/21-overview">XDC 2023 website</a>.</li>
<li><a href="https://www.igalia.com/">Igalia website</a>.</li>
</ol>2023-12-20T12:00:00+00:00Dave Airlie (blogspot): radv: vulkan video encode status
https://airlied.blogspot.com/2023/12/radv-vulkan-video-encode-status.html
<p>Vulkan 1.3.274 moves the Vulkan encode work out of BETA and moves h264 and h265 into KHR extensions. radv support for the Vulkan video encode extensions has been in progress for a while.</p><p>The latest branch is at [1]. This branch has been updated for the new final headers.<br /></p><p>Updated: It passes all of h265 CTS now, but it is failing one h264 test.<br /></p>Initial ffmpeg support is [2].<br /><p>[1] <a href="https://gitlab.freedesktop.org/airlied/mesa/-/tree/radv-vulkan-video-encode-h2645-spec-latest?ref_type=heads">https://gitlab.freedesktop.org/airlied/mesa/-/tree/radv-vulkan-video-encode-h2645-spec-latest?ref_type=heads</a></p><p>[2] <a href="https://github.com/cyanreg/FFmpeg/commits/vulkan/">https://github.com/cyanreg/FFmpeg/commits/vulkan/</a><br /></p>2023-12-19T20:29:27+00:00Simon Ser: Status update, December 2023
https://emersion.fr/blog/2023/status-update-59/
<p>Hi all!</p>
<p>This month we’ve finally released <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/releases/0.17.0">wlroots 0.17.0</a>! It’s been a long time since
the previous release (1 year), we’ll try to ship future releases a bit more
frequently. We’re preparing 0.17.1 with a collection of bugfixes, it should be
ready soon.</p>
<p>I’ve been working on <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4480"><code>wlr_surface_synced</code></a>, a new wlroots abstraction to allow
surface commits coming from clients to be delayed. This is required to avoid
stalling the whole compositor if a client GPU work is slow and to implement
explicit synchronization. I’ve also been working on a <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/256">commit-queue-v1</a>
implementation for wlroots and gamescope, which will allow us to get rid of a
CPU wait in Mesa. And I’ve put some finishing touches on Rose’s
<a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4307">frame scheduler patches</a>. Last, I’ve merged André Almeida’s kernel patches for
atomic async page-flips, making it so modern compositors can enable tearing
page-flips without having to go through the legacy KMS uAPI.</p>
<p>I’ve added OAuth refresh tokens to meta.sr.ht. Having to renew OAuth tokens
every year on my clients is annoying, with refresh tokens that’s a thing of the
past! I’ve already updated hottub (CI bridge for GitHub) to leverage this, and
I’d like to also implement this in hut (CLI tool) and yojo (CI bridge for
Codeberg). Note that since meta.sr.ht has only now started returning refresh
tokens on login, users will need to re-login one last time so that the OAuth
clients can grab the refresh token.</p>
<p>The <abbr title="New Project of the Month">NPotM</abbr> is a bit peculiar: I
haven’t actually started working on it this month, and it’s not in a usable
state yet. It’s <a href="https://git.sr.ht/~emersion/go-sqlgen">go-sqlgen</a>, a Go code generator which takes SQL as input. The
goal is to store SQL queries in a separate file, to make them safer (type
checking for the arguments) and faster (prepared statements). It’s somewhat
similar to <a href="https://github.com/sqlc-dev/sqlc">sqlc</a> except it aims at being simpler and database-agnostic.
There’s still much to do: I’d like to add support for named parameters, check
that the number of parameters in the query matches the number of procedure
arguments, and make it easy to write migrations. I’m not yet sure go-sqlgen is
worth the trouble: being database-agnostic limits its abilities, perhaps too
much.</p>
<p>Then comes the usual mix of random smaller updates. I’ve released <a href="https://git.sr.ht/~emersion/soju/refs/v0.7.0">soju 0.7.0</a>
and <a href="https://git.sr.ht/~emersion/goguma/refs/v0.6.0">goguma 0.6.0</a> with a few new features and bugfixes. <a href="https://sr.ht/~emersion/pyonji/">pyonji</a> now
understands the <a href="https://b4.docs.kernel.org/en/latest/">b4</a> config file, so it’s possible to add this file to your
project to preconfigure pyonji with a mailing list (<a href="https://git.sr.ht/~emersion/pyonji/tree/2b35313afaf619067f90b0a418cba1b65ef6de9f/item/.b4-config">example</a>).
delthas has implemented account data import in hut, so it’s now easy to migrate
accounts between sr.ht instances, or projects between accounts. <a href="https://git.sr.ht/~emersion/go-scfg">go-scfg</a> now
supports decoding a configuration file directly into a Go struct, making it
unnecessary to hand-roll parsing code (<a href="https://godocs.io/git.sr.ht/~emersion/go-scfg#example-Decoder">example</a>).</p>
<p>I’ll be giving a <a href="https://fosdem.org/2024/schedule/event/fosdem-2024-2647--protocols-go-imap-v2-things-i-wish-i-knew-before-starting-to-write-an-imap-library/">FOSDEM talk</a> about quirks and gotchas of the IMAP protocol
this year. I’ll be happy to say hi if any of you are coming as well. That’s all
I have for this month, see you in January!</p>2023-12-17T22:00:00+00:00Peter Hutterer: Xorg being removed. What does this mean?
http://who-t.blogspot.com/2023/12/xorg-being-removed-what-does-this-mean.html
<p>
You may have seen the news that <a href="https://www.redhat.com/en/blog/rhel-10-plans-wayland-and-xorg-server">Red Hat Enterprise Linux 10 plans to remove Xorg</a>. But Xwayland will stay around, and given the name overloading and them sharing a git repository there's some confusion over what is Xorg. So here's a very simple "picture". This is the xserver git repository:
</p><pre>$ tree -d -L 2 xserver
xserver
├── composite
├── config
├── damageext
├── dbe
├── dix
├── doc
│ └── dtrace
├── dri3
├── exa
├── fb
├── glamor
├── glx
├── hw
│ ├── kdrive
│ ├── vfb
│ ├── xfree86 <- this one is Xorg
│ ├── xnest
│ ├── xquartz
│ ├── xwayland
│ └── xwin
├── include
├── m4
├── man
├── mi
├── miext
│ ├── damage
│ ├── rootless
│ ├── shadow
│ └── sync
├── os
├── present
├── pseudoramiX
├── randr
├── record
├── render
├── test
│ ├── bigreq
│ ├── bugs
│ ├── damage
│ ├── scripts
│ ├── sync
│ ├── xi1
│ └── xi2
├── Xext
├── xfixes
├── Xi
└── xkb
</pre>
The git repo produces several X servers, including the one designed to run on bare metal: Xorg (in <i>hw/xfree86</i> for historical reasons). The other <i>hw</i> directories are the other X servers including Xwayland. All the other directories are core X server functionality that's shared between all X servers [1]. Removing Xorg from a distro but keeping Xwayland means building with <i>--disable-xfree86 -enable-xwayland</i> [1]. That's simply it (plus the resulting distro packaging work of course).
<p></p>
<p>Removing Xorg means you need something else that runs on bare metal and that is your favourite Wayland compositor. Xwayland then talks to that while presenting an X11-compatible socket to existing X11 applications.</p>
<p>
Of course all this means that the X server repo will continue to see patches and many of those will also affect Xorg. For those who are running git master anyway. Don't get your hopes up for more Xorg releases beyond the security update background noise [2].
</p>
<p>
Xwayland on the other hand is actively maintained and will continue to see releases. But those releases are a sequence [1] of
</p><pre>$ git new-branch xwayland-23.x.y
$ git rm hw/{kdrive/vfb/xfree86/xnest,xquartz,xwin}
$ git tag xwayland-23.x.y
</pre>
In other words, an Xwayland release is the xserver git master branch <i>with all X servers but Xwayland removed</i>. That's how Xwayland can see new updates and releases without Xorg ever seeing those (except on git master of course). And that's how your installed Xwayland has code from 2023 while your installed Xorg is still stuck on the branch created and barely updated after 2021.
<p></p>
<p>I hope this helps a bit with the confusion of the seemingly mixed messages sent when you see headlines like "Xorg is unmaintained", "X server patches to fix blah", "Xorg is abandoned", "new Xwayland release.</p>
<p>
<small>
[1] not 100% accurate but close enough<br />
[2] historically an Xorg release included all other X servers (Xquartz, Xwin, Xvfb, ...) too so this applies to those servers too unless they adopt the Xwayland release model<br />
</small>
</p>2023-12-14T04:13:17+00:00Melissa Wen: 15 Tips for Debugging Issues in the AMD Display Kernel Driver
https://melissawen.github.io/blog/2023/12/13/amd-display-debugging-tips
<p><em>A self-help guide for examining and debugging the AMD display driver within the
Linux kernel/DRM subsystem.</em></p>
<p>It’s based on my experience as an external developer working on the driver, and
are shared with the goal of helping others navigate the driver code.</p>
<p><strong>Acknowledgments:</strong> These tips were gathered thanks to the countless help
received from AMD developers during the driver development process. The list
below was obtained by examining open source code, reviewing public
documentation, playing with tools, asking in public forums and also with the
help of my former GSoC mentor, <a href="https://siqueira.tech/">Rodrigo Siqueira</a>.</p>
<h2 id="pre-debugging-steps">Pre-Debugging Steps:</h2>
<p>Before diving into an issue, it’s crucial to perform two essential steps:</p>
<p><strong>1) Check the latest changes:</strong> Ensure you’re working with the latest AMD
driver modifications located in the
<a href="https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next">amd-staging-drm-next branch</a>
maintained by Alex Deucher. You may also find bug fixes for newer kernel
versions on branches that have the name pattern <code class="language-plaintext highlighter-rouge">drm-fixes-<date></code>.</p>
<p><strong>2) Examine the issue tracker:</strong> Confirm that your issue isn’t already
documented and addressed in the AMD display driver issue tracker. If you find a
similar issue, you can team up with others and speed up the debugging process.</p>
<h2 id="understanding-the-issue">Understanding the issue:</h2>
<p>Do you really need to change this? Where should you start looking for changes?</p>
<p><strong>3) Is the issue in the AMD kernel driver or in the userspace?:</strong> Identifying
the source of the issue is essential regardless of the GPU vendor. Sometimes
this can be challenging so here are some helpful tips:</p>
<ul>
<li>Record the screen: Capture the screen using a recording app while
experiencing the issue. If the bug appears in the capture, it’s likely a
userspace issue, not the kernel display driver.</li>
<li>Analyze the dmesg log: Look for error messages related to the display
driver in the dmesg log. If the error message appears before the message
“<code class="language-plaintext highlighter-rouge">[drm] Display Core v...</code>”, it’s not likely a display driver issue. If this
message doesn’t appear in your log, the display driver wasn’t fully loaded and
you will see a notification that something went wrong here.</li>
</ul>
<p><strong>4) AMD Display Manager vs. AMD Display Core:</strong> The AMD display driver
consists of two components:</p>
<ul>
<li>Display Manager (DM): This component interacts directly with the Linux DRM
infrastructure. Occasionally, issues can arise from misinterpretations of DRM
properties or features. If the issue doesn’t occur on other platforms with the
same AMD hardware - for example, only happens on Linux but not on Windows -
it’s more likely related to the AMD DM code.</li>
<li>Display Core (DC): This is the platform-agnostic part responsible for setting
and programming hardware features. Modifications to the DC usually require
validation on other platforms, like Windows, to avoid regressions.</li>
</ul>
<p><strong>5) Identify the DC HW family:</strong> Each AMD GPU has variations in its hardware
architecture. Features and helpers differ between families, so determining the
relevant code for your specific hardware is crucial.</p>
<ul>
<li>Find GPU product information <a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/driver-misc.html#gpu-product-information">in Linux/AMD GPU documentation</a></li>
<li>Check the dmesg log for the Display Core version (since <a href="https://github.com/torvalds/linux/commit/bf7fda0b3736f93ac8b18e7147e1e7acd27e6a19">this commit</a>
in Linux kernel 6.3v). For example:
<ul>
<li><code class="language-plaintext highlighter-rouge">[drm] Display Core v3.2.241 initialized on DCN 2.1</code></li>
<li><code class="language-plaintext highlighter-rouge">[drm] Display Core v3.2.237 initialized on DCN 3.0.1</code></li>
</ul>
</li>
</ul>
<h2 id="investigating-the-relevant-driver-code">Investigating the relevant driver code:</h2>
<p>Keep from letting unrelated driver code to affect your investigation.</p>
<p><strong>6) Narrow the code inspection down to one DC HW family:</strong> the relevant code
resides in a directory named after the DC number. For example, the DCN 3.0.1
driver code is located at <code class="language-plaintext highlighter-rouge">drivers/gpu/drm/amd/display/dc/dcn301</code>. We all know
that the AMD’s shared code is huge and you can use these boundaries to rule out
codes unrelated to your issue.</p>
<p><strong>7) Newer families may inherit code from older ones:</strong> you can find dcn301
using code from dcn30, dcn20, dcn10 files. It’s crucial to verify which hooks
and helpers your driver utilizes to investigate the right portion. You can
leverage <code class="language-plaintext highlighter-rouge">ftrace</code> for supplemental validation. To give an example, it was
useful when I was updating DCN3 color mapping to correctly use their new
post-blending color capabilities, such as:</p>
<ul>
<li><a href="https://lore.kernel.org/dri-devel/20230721132431.692158-1-mwen@igalia.com/">[PATCH] drm/amd/display: set stream gamut remap matrix to MPC for DCN3+</a></li>
</ul>
<p>Additionally, you can use two different HW families to compare behaviours.
If you see the issue in one but not in the other, you can compare the code and
understand what has changed and if the implementation from a previous family
doesn’t fit well the new HW resources or design. You can also count on the help
of the community on the
<a href="https://gitlab.freedesktop.org/drm/amd/-/issues/">Linux AMD issue tracker</a>
to validate your code on other hardware and/or systems.</p>
<p>This approach helped me debug
<a href="https://gitlab.freedesktop.org/drm/amd/-/issues/1513#note_2003082">a 2-year-old issue</a>
where the cursor gamma adjustment was incorrect in DCN3 hardware, but working
correctly for DCN2 family. I solved the issue in two steps, thanks for
community feedback and validation:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/20230731083505.1500965-1-mwen@igalia.com/">[PATCH] drm/amd/display: check attr flag before set cursor degamma on DCN3+</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230824133810.10627-1-mwen@igalia.com/">[PATCH] drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma</a></li>
</ul>
<p><strong>8) Check the hardware capability screening in the driver:</strong> You can currently find a
list of display hardware capabilities in the
<code class="language-plaintext highlighter-rouge">drivers/gpu/drm/amd/display/dc/dcn*/dcn*_resource.c</code> file. More precisely in
the <code class="language-plaintext highlighter-rouge">dcn*_resource_construct()</code> function.
Using DCN301 for illustration, here is the list of its hardware caps:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> /*************************************************
* Resource + asic cap harcoding *
*************************************************/
pool->base.underlay_pipe_index = NO_UNDERLAY_PIPE;
pool->base.pipe_count = pool->base.res_cap->num_timing_generator;
pool->base.mpcc_count = pool->base.res_cap->num_timing_generator;
dc->caps.max_downscale_ratio = 600;
dc->caps.i2c_speed_in_khz = 100;
dc->caps.i2c_speed_in_khz_hdcp = 5; /*1.4 w/a enabled by default*/
dc->caps.max_cursor_size = 256;
dc->caps.min_horizontal_blanking_period = 80;
dc->caps.dmdata_alloc_size = 2048;
dc->caps.max_slave_planes = 2;
dc->caps.max_slave_yuv_planes = 2;
dc->caps.max_slave_rgb_planes = 2;
dc->caps.is_apu = true;
dc->caps.post_blend_color_processing = true;
dc->caps.force_dp_tps4_for_cp2520 = true;
dc->caps.extended_aux_timeout_support = true;
dc->caps.dmcub_support = true;
/* Color pipeline capabilities */
dc->caps.color.dpp.dcn_arch = 1;
dc->caps.color.dpp.input_lut_shared = 0;
dc->caps.color.dpp.icsc = 1;
dc->caps.color.dpp.dgam_ram = 0; // must use gamma_corr
dc->caps.color.dpp.dgam_rom_caps.srgb = 1;
dc->caps.color.dpp.dgam_rom_caps.bt2020 = 1;
dc->caps.color.dpp.dgam_rom_caps.gamma2_2 = 1;
dc->caps.color.dpp.dgam_rom_caps.pq = 1;
dc->caps.color.dpp.dgam_rom_caps.hlg = 1;
dc->caps.color.dpp.post_csc = 1;
dc->caps.color.dpp.gamma_corr = 1;
dc->caps.color.dpp.dgam_rom_for_yuv = 0;
dc->caps.color.dpp.hw_3d_lut = 1;
dc->caps.color.dpp.ogam_ram = 1;
// no OGAM ROM on DCN301
dc->caps.color.dpp.ogam_rom_caps.srgb = 0;
dc->caps.color.dpp.ogam_rom_caps.bt2020 = 0;
dc->caps.color.dpp.ogam_rom_caps.gamma2_2 = 0;
dc->caps.color.dpp.ogam_rom_caps.pq = 0;
dc->caps.color.dpp.ogam_rom_caps.hlg = 0;
dc->caps.color.dpp.ocsc = 0;
dc->caps.color.mpc.gamut_remap = 1;
dc->caps.color.mpc.num_3dluts = pool->base.res_cap->num_mpc_3dlut; //2
dc->caps.color.mpc.ogam_ram = 1;
dc->caps.color.mpc.ogam_rom_caps.srgb = 0;
dc->caps.color.mpc.ogam_rom_caps.bt2020 = 0;
dc->caps.color.mpc.ogam_rom_caps.gamma2_2 = 0;
dc->caps.color.mpc.ogam_rom_caps.pq = 0;
dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
dc->caps.color.mpc.ocsc = 1;
dc->caps.dp_hdmi21_pcon_support = true;
/* read VBIOS LTTPR caps */
if (ctx->dc_bios->funcs->get_lttpr_caps) {
enum bp_result bp_query_result;
uint8_t is_vbios_lttpr_enable = 0;
bp_query_result = ctx->dc_bios->funcs->get_lttpr_caps(ctx->dc_bios, &is_vbios_lttpr_enable);
dc->caps.vbios_lttpr_enable = (bp_query_result == BP_RESULT_OK) && !!is_vbios_lttpr_enable;
}
if (ctx->dc_bios->funcs->get_lttpr_interop) {
enum bp_result bp_query_result;
uint8_t is_vbios_interop_enabled = 0;
bp_query_result = ctx->dc_bios->funcs->get_lttpr_interop(ctx->dc_bios, &is_vbios_interop_enabled);
dc->caps.vbios_lttpr_aware = (bp_query_result == BP_RESULT_OK) && !!is_vbios_interop_enabled;
}
</code></pre></div></div>
<p>Keep in mind that the documentation of color capabilities are available at
<a href="https://docs.kernel.org/gpu/amdgpu/display/display-manager.html#dc-color-capabilities-between-dcn-generations">the Linux kernel Documentation</a>.</p>
<h2 id="understanding-the-development-history">Understanding the development history:</h2>
<p>What has brought us to the current state?</p>
<p><strong>9) Pinpoint relevant commits:</strong> Use <code class="language-plaintext highlighter-rouge">git log</code> and <code class="language-plaintext highlighter-rouge">git blame</code> to identify commits
targeting the code section you’re interested in.</p>
<p><strong>10) Track regressions:</strong> If you’re examining the <code class="language-plaintext highlighter-rouge">amd-staging-drm-next</code>
branch, check for regressions between DC release versions. These are defined by
<code class="language-plaintext highlighter-rouge">DC_VER</code> in the <code class="language-plaintext highlighter-rouge">drivers/gpu/drm/amd/display/dc/dc.h</code> file. Alternatively,
find a commit with this format <code class="language-plaintext highlighter-rouge">drm/amd/display: 3.2.221</code> that determines a
display release. It’s useful for bisecting. This information helps you
understand how outdated your branch is and identify potential regressions. You
can consider each <code class="language-plaintext highlighter-rouge">DC_VER</code> takes around one week to be bumped. Finally, check
testing log of each release in the report provided on the <code class="language-plaintext highlighter-rouge">amd-gfx</code> mailing
list, such as this one <code class="language-plaintext highlighter-rouge">Tested-by: Daniel Wheeler</code>:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/DS0PR12MB65344F38E185B7DD4E32A4F29C8FA@DS0PR12MB6534.namprd12.prod.outlook.com/">RE: [PATCH 00/13] DC Patches for Dec 11, 2023</a></li>
</ul>
<h2 id="reducing-the-inspection-area">Reducing the inspection area:</h2>
<p>Focus on what really matters.</p>
<p><strong>11) Identify involved HW blocks:</strong> This helps isolate the issue. You can find
more information about DCN HW blocks in the
<a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/dcn-overview.html">DCN Overview documentation</a>.
In summary:</p>
<ul>
<li>Plane issues are closer to HUBP and DPP.</li>
<li>Blending/Stream issues are closer to MPC, OPP and OPTC. They are related
to DRM CRTC subjects.</li>
</ul>
<p>This information was useful when debugging a hardware rotation issue where
<a href="https://gitlab.freedesktop.org/drm/amd/-/issues/2247#note_1747639">the cursor plane got clipped off in the middle of the screen</a>.</p>
<p>Finally, the issue was addressed by two patches:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/20221118125935.4013669-22-Brian.Chang@amd.com/">[PATCH 21/22] drm/amd/display: Fix rotated cursor offset calculation</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230131160546.150611-1-mwen@igalia.com/">[PATCH] drm/amd/display: fix cursor offset on rotation 180</a></li>
</ul>
<p><strong>12) Issues around bandwidth (glitches) and clocks:</strong> May be affected by
calculations done in these HW blocks and HW specific values. The
recalculation equations are found in the DML folder.
DML stands for Display Mode Library. It’s in charge of all required
configuration parameters supported by the hardware for multiple scenarios. See
more in the <a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/dcn-overview.html#amd-hardware-pipeline">AMD DC Overview kernel docs</a>.
It’s a math library that optimally configures hardware to find the best
balance between power efficiency and performance in a given scenario.</p>
<p>Finding some clk variables that affect device behavior may be a sign of it.
It’s hard for a external developer to debug this part, since it involves
information from HW specs and firmware programming that we don’t have access.
The best option is to provide all relevant debugging information you have and
ask AMD developers to check the values from your suspicions.</p>
<ul>
<li><em>Do a trick: If you suspect the power setup is degrading performance, try
setting the amount of power supplied to the GPU to the maximum and see if
it affects the system behavior with this command:
<code class="language-plaintext highlighter-rouge">sudo bash -c "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level"</code></em></li>
</ul>
<p>I learned it when debugging
<a href="https://gitlab.freedesktop.org/drm/amd/-/issues/2247#note_1748842">glitches with hardware cursor rotation on Steam Deck</a>.
My first attempt was <a href="https://lore.kernel.org/amd-gfx/20230207233235.513948-1-mwen@igalia.com">changing the clock calculation</a>.
In the end, Rodrigo Siqueira proposed the right solution targeting bandwidth in
two steps:</p>
<ul>
<li><a href="https://patchwork.freedesktop.org/series/114632/">Patch series to create a new internal commit sequence</a></li>
<li><a href="https://patchwork.freedesktop.org/patch/526108/?series=114927&rev=1">Enabling pipe split on DCN301</a></li>
</ul>
<h2 id="checking-implicit-programming-and-hardware-limitations">Checking implicit programming and hardware limitations:</h2>
<p>Bring implicit programming to the level of consciousness and recognize hardware
limitations.</p>
<p><strong>13) Implicit update types:</strong> Check if the selected type for atomic update may
affect your issue. The update type depends on the mode settings, since
programming some modes demands more time for hardware processing. More details
in the
<a href="https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/amd/display/dc/dc.h">source code</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/* Surface update type is used by dc_update_surfaces_and_stream
* The update type is determined at the very beginning of the function based
* on parameters passed in and decides how much programming (or updating) is
* going to be done during the call.
*
* UPDATE_TYPE_FAST is used for really fast updates that do not require much
* logical calculations or hardware register programming. This update MUST be
* ISR safe on windows. Currently fast update will only be used to flip surface
* address.
*
* UPDATE_TYPE_MED is used for slower updates which require significant hw
* re-programming however do not affect bandwidth consumption or clock
* requirements. At present, this is the level at which front end updates
* that do not require us to run bw_calcs happen. These are in/out transfer func
* updates, viewport offset changes, recout size changes and pixel
depth changes.
* This update can be done at ISR, but we want to minimize how often
this happens.
*
* UPDATE_TYPE_FULL is slow. Really slow. This requires us to recalculate our
* bandwidth and clocks, possibly rearrange some pipes and reprogram
anything front
* end related. Any time viewport dimensions, recout dimensions,
scaling ratios or
* gamma need to be adjusted or pipe needs to be turned on (or
disconnected) we do
* a full update. This cannot be done at ISR level and should be a rare event.
* Unless someone is stress testing mpo enter/exit, playing with
colour or adjusting
* underscan we don't expect to see this call at all.
*/
enum surface_update_type {
UPDATE_TYPE_FAST, /* super fast, safe to execute in isr */
UPDATE_TYPE_MED, /* ISR safe, most of programming needed, no bw/clk change*/
UPDATE_TYPE_FULL, /* may need to shuffle resources */
};
</code></pre></div></div>
<h2 id="using-tools">Using tools:</h2>
<p>Observe the current state, validate your findings, continue improvements.</p>
<p><strong>14) Use AMD tools to check hardware state and driver programming:</strong> help on
understanding your driver settings and checking the behavior when changing
those settings.</p>
<ul>
<li>
<p><a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/dc-debug.html#dc-visual-confirmation">DC Visual confirmation</a>:
Check multiple planes and pipe split policy.
<a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/dc-debug.html#dc-visual-confirmation" title="AMDGPU DC Visual Confirmation on DCN 3.0.1 (Steam Deck)"><img alt="" src="https://github.com/melissawen/melissawen.github.io/blob/amd-display-debug-tip/img/amdgpu_dm_visualconfirmation_pipesplit_screen_deck.jpg?raw=true" /></a></p>
</li>
<li>
<p><a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/dc-debug.html#dtn-debug">DTN logs</a>:
Check display hardware state, including rotation, size, format, underflow,
blocks in use, color block values, etc.
<a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/dc-debug.html#dtn-debug" title="AMDGPU - DTN log on DCN 2.1"><img alt="" src="https://github.com/melissawen/melissawen.github.io/blob/amd-display-debug-tip/img/amdgpu_dm_dtnlog_dcn21.png?raw=true" /></a></p>
</li>
<li>
<p><a href="https://gitlab.freedesktop.org/tomstdenis/umr">UMR</a>:
Check ASIC info, register values, KMS state - links and elements (framebuffers,
planes, CRTCs, connectors).
<a href="https://gitlab.freedesktop.org/tomstdenis/umr/-/blob/main/doc/sphinx/source/umr_gui_kms_landing.png" title="Screenshot of the KMS tab from AMD UMR documentation"><img alt="" src="https://gitlab.freedesktop.org/tomstdenis/umr/-/raw/main/doc/sphinx/source/umr_gui_kms_landing.png?ref_type=heads" /></a>
Source: <a href="https://gitlab.freedesktop.org/tomstdenis/umr">UMR project documentation</a></p>
</li>
</ul>
<p><strong>15) Use generic DRM/KMS tools:</strong></p>
<ul>
<li>
<p><a href="https://gitlab.freedesktop.org/drm/igt-gpu-tools">IGT test tools</a>: Use
generic KMS tests or develop your own to isolate the issue in the kernel
space. Compare results across different GPU vendors to understand their
implementations and find potential solutions. Here AMD also has specific IGT
tests for its GPUs that is expect to work without failures on any AMD GPU. You
can check results of HW-specific tests using different display hardware
families or you can compare expected differences between the generic workflow
and AMD workflow.</p>
</li>
<li>
<p><a href="https://github.com/ascent12/drm_info">drm_info</a>: This tool summarizes the
current state of a display driver (capabilities, properties and formats) per
element of the DRM/KMS workflow. Output can be helpful when reporting bugs.</p>
</li>
</ul>
<h2 id="dont-give-up">Don’t give up!</h2>
<p>Debugging issues in the AMD display driver can be challenging, but by following
these tips and leveraging available resources, you can significantly improve
your chances of success.</p>
<p><strong>Worth mentioning:</strong> This blog post builds upon my talk,
<a href="https://www.youtube.com/watch?v=CMm-yhsMB7U">“I’m not an AMD expert, but…”</a>
presented at the 2022 XDC. It shares guidelines that helped me debug AMD
display issues as an external developer of the driver.</p>
<p><strong>Open Source Display Driver:</strong> The Linux kernel/AMD display driver is open
source, allowing you to actively contribute by addressing issues listed in the
<a href="https://gitlab.freedesktop.org/drm/amd">official tracker</a>. Tackling existing
issues or resolving your own can be a rewarding learning experience and a
valuable contribution to the community. Additionally, the tracker serves as a
valuable resource for finding similar bugs, troubleshooting tips, and
suggestions from AMD developers. Finally, it’s a platform for seeking help when
needed.</p>
<p>Remember, contributing to the open source community through issue resolution
and collaboration is mutually beneficial for everyone involved.</p>2023-12-13T12:25:00+00:00Ricardo Garcia: I'm playing Far Cry 6 on Linux
https://rg3.name/202312081138.html
<div class="paragraph">
<p><strong>2023-12-10 UPDATE</strong>: From Mastodon, <a href="https://mastodon.arcepi.net/@arcepi/111550517846396293">arcepi suggested</a> the instability problems that I described below and served as a motivation to try Far Cry 6 on Linux could be coming from having switched from NVIDIA to AMD without reinstalling Windows, because of leftover files from the NVIDIA drivers. Today morning I reinstalled Windows to test this and, indeed, the Deathloop and Far Cry 6 crashes seem to be gone (yay!). That would have removed my original motivation to try to run the game on Linux, but it doesn’t take away the main points of the post. Do take into account that the instability doesn’t seem to exist anymore (and I hope this applies to more future titles I play) but it’s still the background story to explain why I decided to install Far Cry 6 on my Fedora 39 system, so the original post follows below.</p>
</div>
<div class="paragraph">
<p>If you’ve been paying attention to the evolution of the Linux gaming ecosystem in recent years, including the release of the Steam Deck and the new Steam Deck OLED, it’s likely your initial reaction to the blog post title is a simple “OK”.
However, I’m coming from a very particular place so I wanted to explain my point of view and the significance of this, and hopefully you’ll find the story interesting.</p>
</div>
<div class="imageblock">
<div class="content">
<a class="image" href="https://rg3.name/img/steam-running-on-fedora-39.png"><img alt="steam running on fedora 39.tn" src="https://rg3.name/img/steam-running-on-fedora-39.tn.png" /></a>
</div>
<div class="title">Figure 1. Steam running on Fedora Linux 39</div>
</div>
<div class="paragraph">
<p>As a background, let me say I’ve always gamed on Windows when using my PC.
If you think I’m an idiot for doing so lately, specially because my work at <a href="https://www.igalia.com/">Igalia</a> involves frequently interacting with Valve contractors like Samuel Pitoiset, Timur Kristóf, Mike Blumenkrantz or Hans-Kristian Arntzen, you’d be more than right.
But hear me out.
I’ve always gamed on Windows because it’s the safe bet.
With a couple of small kids at home and very limited free time, when I game everything has to just work.
No fiddling around with software, config files, or wasting time setting up the software stack.
I’m supposed to boot Windows when I want to play, play, and then turn my computer off.
The experience needs to be as close to a console as possible.
And, for anything non-gaming, which is most of it, I’d be using my Linux system.</p>
</div>
<div class="paragraph">
<p>In the last years, thanks to the work done by <a href="https://www.valvesoftware.com/">Valve</a>, the Linux gaming stack has improved a lot.
Despite this, I’ve kept gaming on Windows for a variety of reasons:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>For a long time, my Linux disk only had a capacity of 128GB, so installing games was not a real possibility due to the amount of disk space they need.</p>
</li>
<li>
<p>Also, I was running Slackware and installing Steam and getting the whole thing running implied a fair amount of fiddling I didn’t even want to think about.</p>
</li>
<li>
<p>Then, when I was running Fedora on a large disk, I had kids and I didn’t want to take any risks or possibly waste time on that.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>So, what changed?</p>
</div>
<div class="imageblock">
<div class="content">
<img alt="sapphire pulse amd rx 6700 box" src="https://rg3.name/img/sapphire-pulse-amd-rx-6700-box.jpg" />
</div>
<div class="title">Figure 2. Sapphire Pulse AMD Radeon RX 6700 box</div>
</div>
<div class="paragraph">
<p>Earlier this year I upgraded my PC and replaced an old Intel Haswell i7-4770k with a Ryzen R5 7600X, and my GPU changed from an NVIDIA GTX 1070 to a Radeon RX 6700.
The jump in CPU power was much bigger and impressive than the more modest jump in GPU power.
But talking about that and the sorry state of the GPU market is a story for another blog post.
In any case, I had put up with the NVIDIA proprietary driver for many years and I think, on Windows and for gaming, NVIDIA is the obvious first choice for many people, including me.
Dealing with the proprietary blob under Linux was not particularly problematic, specially with the excellent way it’s handled by <a href="https://rpmfusion.org/Howto/NVIDIA">RPMFusion on Fedora</a>, where essentially you only have to install a few packages and you can mostly forget about it.</p>
</div>
<div class="paragraph">
<p>However, given my recent professional background I decided to go with an AMD card for the first time.
I wanted to use a fully open source graphics stack and I didn’t want to think about making compromises in Wayland support or other fronts whatsoever.
Plus, at the time I upgraded my PC, the timing was almost perfect for me to switch to an AMD card, because:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>AMD cards were, in general, performing better for the same price than NVIDIA cards, except for ray tracing.</p>
</li>
<li>
<p>The RX 6700 non-XT was on sale.</p>
</li>
<li>
<p>It had the same performance as a PS5 or so.</p>
</li>
<li>
<p>It didn’t draw a ton of power like many recent high-end GPUs (175W, similar to the 1070 and its 150W TDP).</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>After the system upgrade, I did notice a few more stability problems when gaming under Windows, compared to what I was used to with an NVIDIA card.
You can find thousands of opinions, comments and anecdotes on the Internet about the quality of AMD drivers, and a lot of people say they’re a couple of steps below NVIDIA drivers.
It’s not my intention at all to pile up on those, but it’s true my own personal experience is having generally more crashes in games and having to face more weird situations since I switched to AMD.
Normally, it doesn’t get to the point of being annoying at all, but sometimes it’s a bit surprising and I could definitely notice that increase in instability without any bias on my side, I believe.
Which takes us to Far Cry 6.</p>
</div>
<div class="paragraph">
<p>A few days ago I finished playing Doom Eternal and its expansions (really nice game, by the way!) and I decided to go with Far Cry 6 next.
I’m slowly working my way up with some more graphically demanding games that I didn’t feel comfortable with playing on the 1070.
I went ahead and installed the game on Windows.
Being a big 70GB download (100GB on disk), that took a bit of time.
Then I launched it, adjusted the keyboard and mouse settings to my liking and I went to the video options menu.
The game had chosen the high preset for me and everything looked good, so I attempted to run the in-game benchmark to see if the game performed well with that preset (I love it when games have built-in benchmarks!).
After a few seconds in a loading screen, the game crashed and I was back to the desktop.
“Oh, what a bad way to start!”, I thought, without knowing what lied ahead.
I launched the game again, same thing.</p>
</div>
<div class="paragraph">
<p>On the course of the 2 hours that followed, I tried everything:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Launching the main game instead of the benchmark, just in case the bug only happened in the benchmark. Nope.</p>
</li>
<li>
<p>Lowering quality and resolution.</p>
</li>
<li>
<p>Disabling any advanced setting.</p>
</li>
<li>
<p>Trying windowed mode, or borderless full screen.</p>
</li>
<li>
<p>Vsync off or on.</p>
</li>
<li>
<p>Disabling the overlays for Ubisoft, Steam, AMD.</p>
</li>
<li>
<p>Rebooting multiple times.</p>
</li>
<li>
<p>Uninstalling the drivers normally as well as using DDU and installing them again.</p>
</li>
</ol>
</div>
<div class="paragraph">
<p>Same result every time.
I also searched on the web for people having similar problems, but got no relevant search results anywhere.
Yes, a lot of people both using AMD and NVIDIA had gotten crashes somewhere in the game under different circumstances, but nobody mentioned specifically being unable to reach any gameplay at all.
That day I went to bed tired and a bit annoyed.
I was also close to having run the game for 2 hours according to Steam, which is the limit for refunds if I recall correctly.
I didn’t want to <em>refund</em> the game, though, I wanted to <em>play</em> it.</p>
</div>
<div class="paragraph">
<p>The next day I was ready to uninstall it and move on to another title in my list but, out of pure curiosity, given that I had already spent a good amount of time trying to make it run, I searched for it on the Proton compatibility database to see if it could be run on Linux, and it seemed to be <a href="https://www.protondb.com/app/2369390/">possible</a>.
The game appeared to be well supported and it was verified to run on the Deck, which was good because both the Deck and my system have an RDNA2 GPU.
In my head I wasn’t fully convinced this could work, because I didn’t know if the problem was in the game (maybe a bug with recent updates) or the drivers or anywhere else (like a hardware problem).</p>
</div>
<div class="paragraph">
<p>And this was, for me, when the fun started.
I installed Steam on Linux from the Gnome Software app.
For those who don’t know it, it’s like an app store for Gnome that acts as a frontend to the package manager.</p>
</div>
<div class="imageblock">
<div class="content">
<a class="image" href="https://rg3.name/img/gnome-software-steam.png"><img alt="gnome software steam.tn" src="https://rg3.name/img/gnome-software-steam.tn.png" /></a>
</div>
<div class="title">Figure 3. Gnome Software showing Steam as an installed application</div>
</div>
<div class="paragraph">
<p>Steam showed up there with 3 possible sources: Flathub, an “rpmfusion-nonfree-steam” repo and the more typical “rpmfusion-nonfree” repo.
I went with the last option and soon I had Steam in my list of apps.
I launched that and authenticated using the Steam mobile app QR code scanning function for logging in (which is a really cool way to log in, by the way, without needing to recall your username and password).</p>
</div>
<div class="paragraph">
<p>My list of installed games was empty and I couldn’t find a way to install Far Cry 6 because it was not available for Linux.
However, I thought there should be an easy way to install it and launch it using the famous Proton compatibility layer, and a quick web search revealed I only had to right-click on the game title, select Properties and choose to “Force the use of a specific Steam Play compatibility tool” under the Compatibility section.
Click-click-click and, sure, the game was ready to install.
I let it download again and launched it.</p>
</div>
<div class="imageblock">
<div class="content">
<img alt="Context menu shown after right-clicking on Far Cry 6 on the Steam application, with the Properties option highlighted" src="https://rg3.name/img/far-cry-6-steam-context-menu.png" />
</div>
</div>
<div class="imageblock">
<div class="content">
<img alt="Far Cry 6 Compatibility tab displaying the option to force the use of a specific Steam Play compatibility tool" src="https://rg3.name/img/far-cry-6-compatibility-tab.png" />
</div>
</div>
<div class="paragraph">
<p>Some stuff pops up about processing or downloading Vulkan shaders and I see it doing some work.
In that first launch, the game takes more time to start compared to what I had seen under Windows, but it ends up launching (and subsequent launches were noticeably faster).
That includes some Ubisoft Connect stuff showing up before the game starts and so on.
Intro videos play normally and I reach the game menu in full screen.
No indication that I was running it on Linux whatsoever.
I go directly to the video options menu, see that the game again selected the high preset, I turn off VSync and launch the benchmark.
Sincerely, honestly, completely and totally expecting it to crash one more time and that would’ve been OK, pointing to a game bug.
But no, for the first time in two days this is what I get:</p>
</div>
<div class="imageblock">
<div class="content">
<a class="image" href="https://rg3.name/img/far-cry-6-benchmark-screenshot.png"><img alt="far cry 6 benchmark screenshot.tn" src="https://rg3.name/img/far-cry-6-benchmark-screenshot.tn.png" /></a>
</div>
<div class="title">Figure 4. Far Cry 6 benchmark screenshot displaying the game running at over 100 frames per second</div>
</div>
<div class="paragraph">
<p>The benchmark runs perfectly, no graphical glitches, no stuttering, frame rates above 100FPS normally, and I had a genuinely happy and surprised grin on my face.
I laughed out loud and my wife asked what was so funny.
Effortless.
No command lines, no config files, nothing.</p>
</div>
<div class="paragraph">
<p>As of today, I’ve played the game for over 30 hours and the game has crashed exactly once out of the blue.
And I think it was an unfortunate game bug.
The rest of the time it’s been running as smooth and as perfect as the first time I ran the benchmark.
Framerate is completely fine and way over the 0 frames per second I got on Windows because it wouldn’t run.
The only problem seems to be that when I finish playing and exit to the desktop, Steam is unable to stop the game completely for some reason (I don’t know the cause) and it shows up as still running.
I usually click on the Stop button in the Steam interface after a few seconds, it stops the game and that’s it.
No problem synchronizing game saves to the cloud or anything.
Just that small bug that, again, only requires a single extra click.</p>
</div>
<div class="paragraph">
<p><strong>2023-12-10 UPDATE</strong>: From Mastodon, <a href="https://peoplemaking.games/@jaco/111549439605865976">Jaco G</a> and <a href="https://floss.social/@berto/111549797721087193">Berto Garcia</a> tell me the game not stopping problem is present in all Ubisoft games and is directly related to the Ubisoft launcher. It keeps running after closing the game, which makes Steam think the game is still running. You can try to close it from the tray if you see the Ubisoft icon there and, if that fails, you can stop the game like I described above.</p>
</div>
<div class="paragraph">
<p>Then I remember something that had happened a few months before, prior to starting to play Doom Eternal under Windows.
I had tried to play Deathloop first, another game in my backlog.
However, the game crashed every few minutes and an error window popped up.
The amount and timing of the crashes didn’t look constant, and lowering the graphics settings sometimes would allow me to play the game a bit longer, but in any case I wasn’t able to finish the game intro level without crashes and being very annoyed.
Searching for the error message on the web, I saw it looked like a game problem that was apparently affecting not only AMD users, but also NVIDIA ones, so I had mentally classified that as a game bug and, similarly to the Far Cry 6 case, I had given up on running the game without refunding it hoping to be able to play it in the future.</p>
</div>
<div class="paragraph">
<p>Now I was wondering if it was really a game bug and, even if it was, if maybe Proton could have a workaround for it and maybe it could be played on Linux.
Again, ProtonDB showed the game to be <a href="https://www.protondb.com/app/1252330">verified on the Deck</a> with encouraging recent reports.
So I installed Deathloop on Linux, launched it just once and played for 20 minutes or so.
No crashes and I got as far as I had gotten on Windows in the intro level.
Again, no graphical glitches that I could see, smooth framerates, etc.
Maybe it was a coincidence and I was lucky, but I think I will be able to play the game without issues when I’m done with Far Cry 6.</p>
</div>
<div class="paragraph">
<p>In conclusion, this story is another data point that tells us the quality of Proton as a product and software compatibility layer is outstanding.
In combination with some high quality open source Mesa drivers like RADV, I’m amazed the experience can be actually better than gaming natively on Windows.
Think about that: the Windows game binary running natively on a DX12 or Vulkan official driver crashes more and doesn’t work as well as the game running on top of a Windows compatibility layer with a graphics API translation layer, on top of a different OS kernel and a different Vulkan driver.
Definitely amazing to me and it speaks wonders of the work Valve has been doing on Linux.
Or it could also speak badly of AMD Windows drivers, or both.</p>
</div>
<div class="paragraph">
<p>Sure, some new games on launch have more compatibility issues, bugs that need fixing, maybe workarounds applied in Proton, etc.
But even in those cases, if you have a bit of patience, play the game some months down the line and check ProtonDB first (ideally before buying the game), you may be in for a great experience.
You don’t need to be an expert either.
Not to mention that some of these details are even better and smoother if you use a Steam Deck as compared to an (officially) unsupported Linux distribution like I do.</p>
</div>2023-12-08T11:38:00+00:00Tomeu Vizoso: Etnaviv NPU update 12: Towards SSDLite MobileDet
https://blog.tomeuvizoso.net/2023/12/etnaviv-npu-update-12-towards-ssdlite.html
<p>During these last two weeks I have been working towards adding support for more operations and kinds of convolutions so we can run more interesting models. As a first target, I'm aiming to <a href="https://arxiv.org/abs/2004.14525">MobileDet</a>, which though a bit old by now (it was introduced in 2020) is still the state of the art in object detection in mobile, used in for example <a href="https://frigate.video/">Frigate NVR</a>.</p><p>I haven't mentioned it in a few updates, but all this work keeps being sponsored by <a href="https://libre.computer/">Libre Computer</a>, who are aiming to be the first manufacturer of single board computers to provide accelerated machine learning with open source components. Check out <a href="https://libre.computer/products/aml-a311d-cc/">Alta</a> and <a href="https://libre.computer/products/aml-s905d3-cc/">Solitude</a> for the first such boards in the market.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://libre.computer/api/products/aml-a311d-cc/gallery/1.webp" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="282" src="https://libre.computer/api/products/aml-a311d-cc/gallery/1.webp" width="320" /></a></div><p></p><h3 style="text-align: left;">Upstreaming</h3><div style="text-align: left;"><p>Igalia's Christian Gmeiner has been giving me great feedback at the <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25714">merge request</a>, and as part of that I <a href="https://lore.kernel.org/lkml/20231116140910.1613508-1-tomeu@tomeuvizoso.net/T/#m3047ef1f33ee2ccdfeeaaa38bb8dfd0cfca95bab">submitted a patch</a> to the kernel to retrieve some parameters that are needed when programming the hardware and that are best not left hardcoded. </p><p>This means that upstreaming to Mesa loses some urgency as we are anyway going to have to wait for the merge window for 6.8 opens, after 6.7 final is out.<br /></p></div><h3 style="text-align: left;">Convolutions with 5x5 weights</h3><p>Until now I had implemented support only for weights with dimensions 1x1 (aka <a href="https://arxiv.org/abs/1712.05245">pointwise convolutions</a>) and 3x3 (the most common by far). Some of the convolutions in MobileDet use 5x5 weight tensors though, so I had to implement support for them. It was a matter of adding some extra complexity to the code that compresses the weight tensors in the format that the hardware expects.</p><p>I implemented this for all kind of supported convolutions: depthwise, strided, with padding, etc.<br /></p><h3 style="text-align: left;">Tensor addition</h3><p>I observed that the vendor blob implements addition operations with convolution jobs, so I looked deeper and saw that it was implementing the addition of two input tensors by placing them as the two channels of a single tensor, then passing them through a 1x1 convolution with a specially crafted weight tensor and bias vector.</p><p>This is working with hardcoded values for some specific input image dimensions, but I still need to gather more data so I can come up with a generic expression.<br /></p><h3 style="text-align: left;">Softmax pooling</h3><p>One more missing operation commonly used in models for mobile is pooling, in its different kinds: average, max, etc.</p><p>The blob implements these operations on the programmable core, with CL-like kernels.</p><p>So I undusted the work that I did in the <a href="https://blog.tomeuvizoso.net/2023/04/a-long-overdue-update.html">first half of 2023</a> and added code to Teflon for passing these operations to the Gallium drivers. Then added a new kind of operation to the ML backend in Etnaviv to make use of the programmable core.</p><p>Things work fine, even if for now I am storing the kernel machine code in a blob inside the C code. The next step will be to implement the kernel in NIR and generate the machine code using the existing compiler in Etnaviv.</p><p>With this piece of work, we are now able to use all the hardware units in the NPU, and even if the programmable core in this configuration is really underpowered, it will allow us to keep the model in memory close to the NPU, instead of having to ping-pong between the NPU and CPU domains.<br /></p><h3 style="text-align: left;">A new test suite</h3><p>With new operations and kinds of convolutions being added, I was starting to have trouble testing all the possible combinations in a practical way, as the test suite that I had was taking more than 20 minutes for a full run.</p><p>To get around that, I reimplemented the tests in C++ with <a href="https://en.wikipedia.org/wiki/Google_Test">GoogleTest</a>, which is supported by Emma Anholt's <a href="https://gitlab.freedesktop.org/anholt/deqp-runner">deqp-runner</a> and will allow me to run the tests in parallel, making full use of the CPU cores in the board.</p><p>That made a big difference, but with so many testing combinations being added (+3000 as of now), it was still not fast enough for me. So I remembered an approach that we were considering to speed up execution of Vulkan and OpenGL conformance tests: caching the golden images that are used to compare and check that the output from the hardware is correct.</p><p>With that, the bottleneck is the network, as I store the cache in NFS, and I can run the full test suite in less than 3 minutes.</p><p>Only that I started finding some tests that were randomly failing, specially when the cache of test results had been already brought into the filesystem cache in the board. After a lot of scratching my head, I came to realize that the Etnaviv kernel driver was trying to submit up to 4 jobs at the same time to the hardware, if userspace was fast enough to enqueue that many jobs before the previous ones had finished.</p><p>There is a <a href="https://elixir.bootlin.com/linux/v6.6.4/source/drivers/gpu/drm/etnaviv/etnaviv_sched.c#L16">kernel module parameter</a> to set the number of jobs that are submitted to the hardware at any given point, and setting that to 1 took me back to rock solid test results, which is an absolute need for keeping the driver author's sanity.<br /></p><h3 style="text-align: left;">Next steps</h3><p>I have quickly added support for a lot of new operations and parameter combinations and the code is not as clean as I would like, in part due to the need for some refactoring.</p><p>So in the next days I will be investing some time in cleaning things up, and afterwards will move to more operations in MobileDet.</p><p style="text-align: left;"><br /></p>2023-12-06T10:21:00+00:00Christian Schaller: Fedora Workstation 39 and beyond
https://blogs.gnome.org/uraeus/2023/11/29/fedora-workstation-39-and-beyond/
<p>I have not been so active for a while with writing these Fedora Workstation updates and part of the reason was that I felt I was beginning to repeat myself a lot, which I partly felt was a side effect of writing them so often, but with some time now since my last update I felt that time was ripe again. So what are some of the things we have been working on and what are our main targets going forward? This is not a exhaustive list, but hopefully items you find interesting. Apologize for weird sentences and potential spelling mistakes, but it ended up a a long post and when you read your own words over for the Nth time you start going blind to issues :)</p>
<h1><font color="blue">PipeWire</font></h1>
<p><img align="right" alt="PipeWire 1.0 is available!" class="aligncenter size-medium wp-image-10672" height="300" src="https://blogs.gnome.org/uraeus/files/2023/11/Firefly-lines-of-musical-notes-spiraling-46102-300x300.jpg" width="300" /> PipeWire keeps the Linux Multimedia revolution rolling[/caption]So lets start with one of your favorite topics, PipeWire. As you probably know <strong>PipeWire 1.0</strong> is now out and I feel it is a project we definitely succeeded with, so big kudos to Wim Taymans for leading this effort. I think the fact that we got both the creator of JACK, Paul Davis and the creator of PulseAudio Lennart Poettering to endorse it means our goal of unifying the Linux audio landscape is being met. I include their endorsement comments from the PipeWire 1.0 release announcement here :</p>
<blockquote><p> “PipeWire represents the next evolution of audio handling for Linux, taking<br />
the best of both pro-audio (JACK) and desktop audio servers (PulseAudio) and<br />
linking them into a single, seamless, powerful new system.”<br />
– Paul Davis, JACK and Ardour author</p></blockquote>
<blockquote><p> “PipeWire is a worthy successor to PulseAudio, providing a feature set<br />
closer to how modern audio hardware works, and with a security model<br />
with today’s application concepts in mind. Version 1.0 marks a<br />
major milestone in completing the adoption of PipeWire in the standard<br />
set of Linux subsystems. Congratulations to the team!”<br />
– Lennart Poettering, Pulseaudio and systemd author</p></blockquote>
<p>So for new readers, PipeWire is a audio and video server we created for Fedora Workstation to replace PulseAudio for consumer audio, JACK for pro-audio and add similar functionality for video to your linux operating system. So instead of having to deal with two different sound server architectures users now just have to deal with one and at the same time they get the same advantages for video handling. Since PipeWire implemented both the PulseAudio API and the JACK API it is a drop in replacement for both of them without needing any changes to the audio applications built for those two sound servers. Wim Taymans alongside the amazing community that has grown around the project has been hard at work maturing PipeWire and adding any missing feature they could find that blocked anyone from moving to it from either PulseAudio and JACK. Wims personal focus recently has been on an IRQ based ALSA driver for PipeWire to be able to provide 100% performance parity with the old JACK server. So while a lot of Pro-audio users felt that PipeWire’s latency was already good enough, this work by Wim shaves of the last few milliseconds to reach the same level of latency as JACK itself had.</p>
<p>In parallel with the work on PipeWire the community and especially Collabora has been hard at work on the new 0.5 release of WirePlumber, the session manager which handles all policy issues for PipeWire. I know people often get a little confused about PipeWire vs WirePlumber, but think of it like this: PipeWire provides you the ability to output audio through a connected speaker, through a bluetooth headset, through an HDMI connection and so on, but it doesn’t provide any ‘smarts’ for how that happens. The smarts are instead provided by WirePlumber which then contains policies to decide where to route your audio or video, either based on user choice or through preset policies making the right choices automatically, like if you disconnect your USB speaker it will move the audio to your internal speaker instead. Anyway, WirePlumber 0.5 will be a major step forward for WirePlumber moving from using lua scripts for configuration to instead using JSON for configuration while retaining lua for scripting. This has many advantages, but I point you to this <a href="https://www.collabora.com/news-and-blog/blog/2022/10/27/from-lua-to-json-refactoring-wireplumber-configuration-system/">excellent blog post by Collabora’s Ashok Sidipotu for the details</a>. Ashok got <a href="https://www.collabora.com/news-and-blog/blog/2023/10/30/wireplumber-exploring-lua-scripts-with-event-dispatcher/">further details about WirePlumber 0.5 that you can find here</a>.</p>
<p>With PipeWire 1.0 out the door I feel we are very close to reaching one of our initial goals with PipeWire, to remove the need for custom pro-audio distributions like Fedora JAM or Ubuntu Studio, and instead just let audio folks be able to use the same great Fedora Workstation as the rest of the world. With 1.0 done Wim plans next to look a bit at things like configuration tools and similar used by pro-audio folks and also dive into the Flatpak portal needs of pro-audio applications more, to ensure that Flatpaks + PipeWire is the future of pro-audio.</p>
<p>On the video handling side its been a little slow going since there applications need to be ported from relying directly on v4l. Jan Grulich has been working with our friends at Mozilla and Google to get PipeWire camera handling support into Firefox and Google Chrome. At the moment it looks like the Firefox support will land first, in fact Jan has set up a <a href="https://copr.fedorainfracloud.org/coprs/jgrulich/Firefox_PipeWire_Camera/">COPR that lets you try it out here</a>. For tracking the upstream work in WebRTC to add PipeWire support <a href="https://bugs.chromium.org/p/webrtc/issues/detail?id=15654">Jan set up this tracker bug</a>. Getting the web browsers to use PipeWire is important both to enable the advanced video routing capabilities of PipeWire, but it will also provide applications the ability to use <a href="https://libcamera.org/">libcamera</a> which is a needed for new modern MIPI cameras to work properly under Linux.</p>
<p>Another important application to get PipeWire camera support into is OBS Studio and the great thing is that community member <a href="https://feaneron.com/">Georges Stavracas</a> is working on getting the PipeWire patches merged into <a href="https://obsproject.com/">OBS Studio</a>, hopefully in time for their planned release early next year. <a href="https://github.com/obsproject/obs-studio/pull/9771">You can track Georges work in this pull request</a>. </p>
<p>For more information about PipeWire 1.0 I recommend our <a href="https://fedoramagazine.org/pipewire-1-0-an-interview-with-pipewire-creator-wim-taymans/">interview with Wim Taymans in Fedora Magazin</a>e and also the <a href="https://linuxunplugged.com/538">interview with Wim on Linux Unplugged podcast</a>.</p>
<p><strong><font color="blue">HDR</font></strong><br />
<img alt="HDR" class="alignright size-medium wp-image-10690" height="300" src="https://blogs.gnome.org/uraeus/files/2023/11/Firefly-swirls-of-bright-colors-reaching-towards-the-sky-10821-300x300.jpg" width="300" />HDR, or High Dynamic Range, is another major effort for us. HDR is a technology I think many of you have become familiar with due to it becoming quite common in TVs these days. It basically provides for greatly increased color depth and luminescence on your screen. This is a change that entails a lot of changes through the stack, because when you introduce into an existing ecosystem like the Linux desktop you have to figure out how to combine both new HDR capable applications and content and old non-HDR applications and content. <a href="https://fosstodon.org/@swick">Sebastian Wick</a>, <a href="https://twitter.com/j_adahl">Jonas Ådahl</a>, <a href="https://en.wikipedia.org/wiki/Olivier_Fourdan">Oliver Fourdan</a>, <a href="https://www.linkedin.com/in/daenzer/">Michel Daenzer</a> and more on the team has been working with other members of the ecosystem from Intel, AMD, NVIDIA, <a href="https://www.collabora.com/">Collabora</a> and more to pick and define the standards and protocols needed in this space. A lot of design work was done early in the year so we been quite focused on implementation work across the drivers, Wayland, Mesa, GStreamer, Mutter, GTK+ and more. Some of the more basic scenarios, like running a fullscreen HDR application is close to be ready, while we are still working hard on getting all the needed pieces together for the more complex scenarios like running SDR and HDR windows composited together on your desktop. So getting for instance full screen games to run in HDR mode with Steam should happen shortly, but the windowed support will probably land closer to summer next year.</p>
<p><strong><font color="blue">Wayland remoting</font></strong><br />
One feature we been also spending a lot of time on is enabling remote logins to a Wayland desktop. You have been able to share your screen under Wayland more or less from day one, but it required your desktop session to be already active. But lets say you wanted to access your Wayland desktop running on a headless system you been out of luck so far and had to rely on the old X session instead. So putting in place all the pieces for this has been quite an undertaking with work having been done on PipeWire, on Wayland portals, gnome remote desktop daemon, <a href="https://gitlab.freedesktop.org/whot/libei">libei</a>; the new input emulation library, gdm and more. The pieces needed are finally falling into place and we expect to have everything needed landed in time for GNOME 46. This support is currently done using a private GNOME API, but a vendor less API is being worked on to replace it. </p>
<p>As a sidenote here not directly related to desktop remoting, but libei has also enabled us to bring xtest support to XWayland which was important for various applications including Valves gamescope.</p>
<p><strong><font color="blue">NVIDIA drivers</font></strong><br />
One area we keep investing in is improving the state of NVIDIA support on Linux. This comes both in the form of being the main company backing the continued development of the <a href="https://nouveau.freedesktop.org/">Nouveau</a> graphics driver. So the challenge with Nouveau is that for the longest while it offered next to no hardware acceleration for 3D graphics. The reason for this was that the firmware that NVIDIA provided for Nouveau to use didn’t expose that functionality and since recent generations of NVIDIA cards only works with firmware signed by NVIDIA this left us stuck. So Nouveau was a good tool for doing an initial install of a system, but if you where doing any kind of serious 3D acceleration, including playing games, then you would need to install the NVIDIA binary driver. So in the last year that landscape around that has changed drastically, with the release of the new out-of-tree open source driver from NVIDIA. Alongside that driver a new firmware has also been made available , one that do provide full support for hardware acceleration.<br />
Let me quickly inject a quick explanation of out-of-tree versus in-tree drivers here. An in-tree driver is basically a kernel driver for a piece of hardware that has been merged into the official Linux kernel from Linus Torvalds and is thus being maintained as part of the official Linux kernel releases. This ensures that the driver integrates well with the rest of the Linux kernel and that it gets updated in sync with the rest of the Linux kernel. So Nouveau is an in-tree kernel driver which also integrates with the rest of the open source graphics stack, like Mesa. The new NVIDIA open source driver is an out-of-tree driver which ships as a separate source code release on its own schedule, but of course NVIDIA works to keeps it working with the upstream kernel releases (which is a lot of work of course and thus considered a major downside to being an out of tree driver).</p>
<p>As of the time of writing this blog post NVIDIAs out-of-tree kernel driver and firmware is still a work in progress for display usercases, but that is changing with NVIDIA exposing more and more display features in the driver (and the firmware) with each new release they do. But if you saw the original announcement of the new open source driver from NVIDIA and have been wondering why no distribution relies on it yet, this is why. So what does this mean for Nouveau? Well our plan is to keep supporting Nouveau for the foreseeable future because it is an in-tree driver, which is a lot easier to ensure keeps working with each new upstream kernel release. </p>
<p>At the same time the new firmware updates allows Nouveau to eventually offer performance levels competitive with the official out-of-tree driver, kind of how the open source AMD driver with MESA offers comparable performance to AMD binary GPU driver userspace. So Nouvea maintainer Ben Skeggs spent the last year working hard on refactoring Nouveau to work with the new firmware and we now have a new release of Nouveau out showing the fruits of that labor, enabling support for NVIDIAs latest chipset. Over time we will have it cover more chipset and expand Vulkan and OpenGL (using Zink) support to be a full fledged accelerated graphics driver.<br />
So some news here, Ben after having worked tirelessly on keeping Nouveau afloat for so many years decided he needed a change of pace and thus decided to leave software development behind for the time being. A big thank you to Ben from all us at Red Hat and Fedora ! The good news is that Danilo Krummrich will take over as the development lead, with Lyude Paul taking on working on the Display side specifically of the driver. We also expect to have other members of the team chipping in too. They will pick up Bens work and continue working with NVIDIA and the community on a bright future for Nouveau.</p>
<p>So as I mentioned though the new open source driver from NVIDIA is still being matured for the display usercase and until it works fully as a display driver neither will Nouveau be able to be a full alternative since they share the same firmware. So people will need to rely on the binary NVIDIA Driver for some time still. One thing we are looking at there and discussing is if there are ways for us to improve the experience of using that binary driver with Secure Boot enabled. Atm that requires quite a bit of manual fiddling with tools like mokutils, but we have some ideas on how to streamline that a bit, but it is a hard nut to solve due to a combination of policy issues, legal issues, security issues and hardware/UEFI bugs so I am making no promises at this point, just a promise that it is something we are looking at.</p>
<p><strong><font color="blue">Accessibility</font></strong><br />
<img alt="" class="alignright size-medium wp-image-10696" height="300" src="https://blogs.gnome.org/uraeus/files/2023/11/laptopshine-300x300.jpg" width="300" />Accessibility is an important feature for us in Fedora Workstation and thus we hired <a href="https://fedoramagazine.org/accessibility-in-fedora-workstation/">Lukáš Tyrychtr</a> to focus on the issue. Lukáš has been working through across the stack fixing issues blocking proper accessibility support in Fedora Workstation and also participated in various accessibility related events. There is still a lot to do there so I was very happy to hear recently that the GNOME Foundation got a million Euro sponsorship from the <a href="https://www.sovereigntechfund.de/">Sovereign Tech Fund</a> to improve various things across the stack, especially improving accessibility. So the combination of Lukáš continued efforts and that new investment should make for a much improved accessibility experience in GNOME and in Fedora Workstation going forward. </p>
<p><strong><font color="blue">GNOME Software</font></strong><br />
Another area that we keep investing in is improving GNOME Software, with Milan Crha working continuously on bugfixing and performance improvements. GNOME Software is actually a fairly complex piece of software as it has to be able to handle the installation and updating of RPMS, OSTree system images, Flatpaks, fonts and firmware for us in addition to the formats it handles for other distributions. For some time it felt was GNOME Software was struggling with the load of all those different formats and usercases and was becoming both slow and with a lot of error messages. Milan has been spending a lot of time dealing with those issues one by one and also recently landed some major performance improvements making the GNOME Software experience a lot better. One major change that Milan is working on that I think we will be able to land in Fedora Workstation 40/41 is porting GNOME Software to use DNF5. The main improvement end users will probably notice is that it unifies the caches used for GNOME Software and using dnf on the command line, saving you storage space and also ensuring the two are fully in sync on what RPMS is installed/updated at any given time.</p>
<p><strong><font color="blue">Fedora and Flatpaks</font></strong><br />
<img alt="" class="alignright size-medium wp-image-10702" height="300" src="https://blogs.gnome.org/uraeus/files/2023/11/flatpak-300x300.jpg" width="300" /><br />
Flatpaks is another key element of our strategy for moving the Linux desktop forward and as part of that we have now enabled all of Flathub to be available if you choose to enable 3rd party repositories when you install Fedora Workstation. This means that the huge universe of applications available on <a href="https://flathub.org/">Flathub</a> will be easy to install through GNOME Software alongside the content available in Fedora’s own repositories. That said we have also spent time improving the ease of making Fedora Flatpaks. Owen Taylor jumped in and removed the dependency on a technology called ‘<a href="https://asamalik.fedorapeople.org/modularity-docs-faq-pr/modularity/">modularity</a>‘ which was initially introduced to Fedora to bring new features around having different types of content and ease keeping containers up to date. Unfortunately it did not work out as intended and instead it became something that everyone just felt made things a lot more complicated, including building Flatpaks from Fedora content. With <a href="https://fedoraproject.org/wiki/Changes/FlatpaksWithoutModules">Owens updates</a> building Flatpaks in Fedora has become a lot simpler and should help energize the effort building Flatpaks in Fedora.</p>
<p><strong><font color="blue">Toolbx</font></strong><br />
<img alt="" class="alignright size-medium wp-image-10759" height="300" src="https://blogs.gnome.org/uraeus/files/2023/11/toolbxtoolbox-300x300.jpg" width="300" />As we continue marching towards a vision for Fedora Workstation to be a highly robust operating we keep evolving <a href="https://containertoolbx.org/">Toolbx</a>. Our tool for making running your development environment(s) inside a container and thus allows you to both keep your host OS pristine and up to date, while at the same time using specific toolchains and tools inside the development container. This is a hard requirement for immutable operating systems such as <a href="https://fedoraproject.org/silverblue/">Fedora Silverblue</a> or <a href="https://universal-blue.org/">Universal blue</a>, but it is also useful on operating systems like Fedora Workstation as a way to do development for other platforms, like for instance Red Hat Enterprise Linux. </p>
<p>A major focus for Toolbx since the inception is to get it a stage where it is robust and reliable. So for instance while we prototyped it as a shell script, today it is written in Go to be more maintainable and also to confirm with the rest of the container ecosystem. A recent major step forward for getting that stability there is that starting with <a href="https://fedoraproject.org/wiki/Changes/ToolbxReleaseBlocker">Fedora 39, the toolbox image is now a </a><a href="https://docs.fedoraproject.org/en-US/releases/f39/blocking">release blocking deliverable</a>. This means it is now built as part of the nightly compose and the whole Toolbx stack (ie. the fedora-toolbox image and the toolbox RPM) is part of the release-blocking test criteria. This shows the level of importance we put on Toolbx as the future of Linux software development and its criticality to Fedora Workstation. Earlier, we built the fedora-toobox image as a somewhat separate and standalone thing, and people interested in Toolbx would try to test and keep the whole thing working, as much as possible, on their own. This was becoming unmanageable because Toolbx integrates with many parts of the distribution from Mutter (ie, the Wayland and X sockets) to Kerberos to RPM (ie., %_netsharedpath in /usr/lib/rpm/macros.d/macros.toolbox) to glibc locale definitions and translations. The list of things that could change elsewhere in Fedora, and end up breaking Toolbx, was growing too large for a small group of Toolbx contributors to keep track of.</p>
<p>We the next release we now also have built-in support for Arch Linux and Ubuntu through the –distro flag in toolbox.git main, thanks again to the community contributors who worked with us on this allowing us to widen the amount of distros supported while keeping with our policy of reliability and dependability. And along the same theme of ensuring Toolbx is a tool developers can rely on we have added lots and lots of new tests. We now have more than 280 tests that run on CentOS Stream 9, all supported Fedoras and Rawhide, and Ubuntu 22.04.</p>
<p>Another feature that Toolbx maintainer Debarshi Ray put a lot of effort into is setting up full RHEL containers in Toolbx on top of Fedora. Today, thanks to Debarshi work you do <code>subscription-manager register --username user@domain.name</code> on the Fedora or RHEL host, and the container is automatically entitled to RHEL content. We are still looking at how we can provide a graphical interface for that process or at least how to polish up the CLI for doing <code>subscription-manager register</code>. If you are interested in this feature, <a href="https://debarshiray.wordpress.com/2023/08/25/fedora-meets-rhel-upgrading-ubi-to-rhel/">Debarshi provides a full breakdown here.</a></p>
<p>Other nice to haves added is support for enterprise FreeIPA set-ups, where the user logs into their machine through Kerberos and support for automatically generated shell completions for Bash, fish and Z shell.</p>
<p><strong><font color="blue">Flatpak and Foreman & Katello</font></strong><br />
For those out there using <a href="https://theforeman.org/">Foreman</a> to manage your fleet of Linux installs we have some good news. We are in the process of implementing support for Flatpaks in these tools so that you can manage and deploy applications in the Flatpak format using them. This is still a work in progress, but relevant Pulp and Katello commits are Pulp commit <a href="https://github.com/pulp/pulp_container/commit/eec513314ad9d55843f225802ea9fb45adad16a3">Support for Flatpak index endpoints</a> and Katello commits <a href="https://github.com/Katello/katello/commit/f6a1688f8078bc50bc1140ca8ea66115cdd9aecf">Reporting results of docker v2 repo discovery</a>” and <a href="https://github.com/Katello/katello/commit/e75f1f149504b2dfe3be54a797fe16dee0e86072">Support Link header in docker v2 repo discovery</a>“.</p>
<p><strong><font color="blue">LVFS</font></strong><br />
<img alt="" class="alignright size-medium wp-image-10768" height="300" src="https://blogs.gnome.org/uraeus/files/2023/11/firmware-300x300.jpg" width="300" />Another effort that Fedora Workstation has brought to the world of Linux and that is very popular arethe LVFS and fwdup formware update repository and tools. Thanks to that effort we are soon going to be <strong>passing one hundred million firmware updates on Linux devices </strong>soon! These firmware updates has helped resolve countless bugs and much improved security for Linux users. </p>
<p>But we are not slowing down. Richard Hughes worked with industry partners this year to define a <a href="https://uefi.org/blog/firmware-sbom-proposal">Bill of Materials defintion to firmware updates </a>allowing usings to be better informed on what is included in their firmware updates.</p>
<p>We now support over 1400 different devices on the LVFS (covering 78 different protocols!), with over 8000 public firmware versions (image below) from over 150 OEMs and ODMs. We’ve now done over 100,000 static analysis tests on over 2,000,000 EFI binaries in the firmware capsules!</p>
<p>Some examples of recently added hardware:<br />
* AMD dGPUs, Navi3x and above, AVer FONE540, Belkin Thunderbolt 4 Core Hub dock, CE-LINK TB4 Docks,CH347 SPI programmer, EPOS ADAPT 1×5, Fibocom FM101, Foxconn T99W373, SDX12, SDX55 and SDX6X devices, Genesys GL32XX SD readers, GL352350, GL3590, GL3525S and GL3525 USB hubs, Goodix Touch controllers, HP Rata/Remi BLE Mice, Intel USB-4 retimers, Jabra Evolve 65e/t and SE, Evolve2, Speak2 and Link devices, Logitech Huddle, Rally System and Tap devices, Luxshare Quad USB4 Dock, MediaTek DP AUX Scalers, Microsoft USB-C Travel Hub, More Logitech Unifying receivers, More PixartRF HPAC devices, More Synaptics Prometheus fingerprint readers, Nordic HID devices, nRF52 Desktop Keyboard, PixArt BLE HPAC OTA, Quectel EM160 and RM520, Some Western Digital eMMC devices, Star Labs StarBook Mk VIr2, Synaptics Triton devices, System76 Launch 3, Launch Heavy 3 and Thelio IO 2, TUXEDO InfinityBook Pro 13 v3, VIA VL122, VL817S, VL822T, VL830 and VL832, Wacom Cintiq Pro 27, DTH134 and DTC121, One 13 and One 12 Tablets</p>
<p><strong><font color="blue">InputLeap on Wayland</font></strong><br />
One really interesting feature that landed for Fedora Workstation 39 was the support for InputLeap. It’s probably not on most peoples radar, but it’s an important feature for system administrators, developers and generally anyone with more than a single computer on their desk.</p>
<p>Historically, <a href="https://github.com/input-leap/input-leap">InputLeap</a> is a fork of <a href="https://github.com/debauchee/barrier">Barrier</a> which itself was a fork of <a href="https://synergy-project.org/">Synergy</a>, it allows to share the same input devices (mouse, keyboard) across different computers (Linux, Windows, MacOS) and to move the pointer between the screens of these computers seamlessly as if they were one.</p>
<p>InputLeap has a client/server architecture with the server running on the main host (the one with the keyboard and mouse connected) and multiple clients, the other machines sitting next to the server machine. That implies two things, the InputLeap daemon on the server must be able to “capture” all the input events to forward them to the remote clients when the pointer reaches the edge of the screen, and the InputLeap client must be able to “replay” those input events on the client host to make it as if the keyboard and mouse were connected directly to the (other) computer. Historically, that relied on X11 mechanisms and neither InputLeap (nor Barrier or even Synergy as a matter of fact) would work on Wayland.</p>
<p>This is one of the use cases that Peter Hutterer had in mind when he started <a href="https://gitlab.freedesktop.org/libinput/libei">libEI</a>, a low-level library aimed at providing a separate communication channel for input emulation in Wayland compositors and clients (even though libEI is not strictly tied to Wayland). But libEI alone is far from being sufficient to implement InputLeap features, with Wayland we had the opportunity to make things more secure than X11 and take benefit from the XDG portal mechanisms.</p>
<p>On the client side, for replaying input events, it’s similar to remote desktop but we needed to <a href="https://github.com/flatpak/xdg-desktop-portal/pull/762">update the existing RemoteDesktop portal to pass the libEI socket</a>. On the server side, it required a <a href="https://github.com/flatpak/xdg-desktop-portal/pull/714">brand new portal for input capture </a>. These also required their counterparts in the GNOME portal, for both <a href="https://gitlab.gnome.org/GNOME/xdg-desktop-portal-gnome/-/merge_requests/97">RemoteDesktop</a> and <a href="https://gitlab.gnome.org/GNOME/xdg-desktop-portal-gnome/-/merge_requests/61">InputCapture</a> [8], and of course, all that needs to be supported by the Wayland compositor, in the case of GNOME that’s <a href="https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2628">mutter</a>. That alone was a lot of work.</p>
<p>Yet, even with all that in place, that’s just the basic requirements to support a Synergy/Barrier/InputLeap-like feature, the tools in question need to have support for the portal and libEI implemented to benefit from the mechanisms we’ve put in place and for the all feature to work and be usable. So libportal was also updated to support the new portal features and a new “Wayland” backend alongside the X11, Windows and Mac OS backends was <a href="https://github.com/input-leap/input-leap/pull/1594">contributed to InputLeap</a>.</p>
<p>The merge request in InputLeap was accepted very early, even before the libEI API was completely stabilized and before the rest of the stack was merged, which I believe was a courageous choice from Povilas (who maintains InputLeap) which helped reduce the time to have the feature actually working, considering the number of components and inter-dependencies involved. Of course, there are still features missing in the Wayland backend, like copy/pasting between hosts, but a <a href="https://github.com/flatpak/xdg-desktop-portal/pull/852">clipboard interface was fairly recently added</a> to the remote desktop portal and therefore could be used by InputLeap to implement that feature.</p>
<p>Fun fact, Xwayland also grew support for libEI also using the <a href="https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/975">remote desktop portal</a> and wires that to the XTEST extension on X11 that InputLeap’s X11 backend uses, so it might even be possible to use the X11 backend of InputLeap in the client side through Xwayland, but of course it’s better to use the Wayland backend on both the client and server sides.</p>
<p>InputLeap is a great example of collaboration between multiple parties upstream including key contributions from us at Red Hat to implement and contribute a feature that has been requested for years upstream..</p>
<p><em><strong>Thank you to Olivier Fourdan, Debarshi Ray, Richard Hughes, Sebastian Wick and Jonas Ådahl for their contributions to this blog post.</strong></em></p>2023-11-29T17:55:08+00:00Tomeu Vizoso: Etnaviv NPU update 11: Now twice as fast!
https://blog.tomeuvizoso.net/2023/11/etnaviv-npu-update-11-now-twice-as-fast.html
<h1 style="text-align: left;">Progress</h1><div style="text-align: left;"> </div><div style="text-align: left;">This update's highlight is that last week I finally got the TP jobs working, which allows us to make the tensor manipulation in the HW, removing 18ms from the tensor preprocessing. We can currently use them for transposing tensors from the format that TensorFlow prefers to that which the HW expects and the other way around, and for lowering strided convolutions to regular ones.<br /></div><div style="text-align: left;"> </div><div style="text-align: left;">This makes our image classification benchmark twice as fast, as expected:<br /></div><p><span style="font-family: courier;">tomeu@arm-64:~/mesa$ ETNA_MESA_DEBUG=ml_msgs python3.10 classification.py -i grace_hopper.bmp -m mobilenet_v1_1.0_224_quant.tflite -l labels_mobilenet_quant_v1_224.txt -e libteflon.so<br />Loading external delegate from build/src/gallium/targets/teflon/libteflon.so with args: {}<br /><b>Running the NN job took 13 ms.</b><br />0.866667: military uniform<br />0.031373: Windsor tie<br />0.015686: mortarboard<br />0.007843: bow tie<br />0.007843: academic gown<br /><b>time: 15.650ms</b><br /></span></p><div style="text-align: left;">60 FPS is already quite interesting for many use cases, but the proprietary driver is able to do the same at around 8 ms, so there is still plenty of room for improvements.</div><div style="text-align: left;"> </div><div style="text-align: left;">Some preliminary testing indicates that enabling zero-run length compression in the weight buffers will make the biggest difference, so that is what I will be working on when I get back to performance work.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Additionally, I also got some experimental jobs running on the programmable core in this NPU, which will allow us to run more advanced models, which tend to use operations that the hardware couldn't be designed for back then.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Upstreaming is going well, those interested can follow it here:</div><div style="text-align: left;"> </div><div style="text-align: left;"><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25714">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25714</a>.<br /></div><div style="text-align: left;"> </div><h1 style="text-align: left;">Next steps</h1><div style="text-align: left;"> </div><p>These will be my priorities during the next couple of weeks, in order:</p><ol style="text-align: left;"><li>Upstreaming</li><li>Get the Mobilenet SSD V1 model running on the HW, for object detection<br /></li><li>Performance<br /></li></ol>2023-11-17T07:46:00+00:00Simon Ser: Status update, November 2023
https://emersion.fr/blog/2023/status-update-58/
<p>Hi! This month I’ve started a new <abbr title="Project of the Month">PotM</abbr>
called <a href="https://sr.ht/~emersion/pyonji/">pyonji</a>. It’s an easy-to-use replacement for the venerable
<code>git-send-email</code> command. The goal is to make it less painful for a new
contributor not familiar with the e-mail based patch submission to submit
patches.</p>
<p><a href="https://asciinema.org/a/620880" target="_blank"><img class="opaque" src="https://asciinema.org/a/620880.svg" /></a></p>
<p>Users are expected to use the same workflow as GitHub, GitLab and friends when
contributing: create a new branch and add commits there. Instead of pushing to
a fork though, users simply invoke <code>pyonji</code>.</p>
<p>When run for the first time, pyonji will ask for your e-mail account details:
e-mail address, password… and nothing else. The SMTP server hostname, port and
other details are automatically detected (via multiple means: SRV records,
Mozilla auto-configuration database, common subdomains, etc). Once the password
is verified pyonji will store everything in the Git configuration (in the same
fashion that git-send-email expects it).</p>
<p>Then pyonji will present a UI with a list of commits to be submitted for
review. The user can tweak details such as the base branch, the mailing list
address, the version of the patch, however that’s rarely needed: pyonji will
find good defaults for these. The user can add a cover letter if desired with
a longer description for the set of patches. Then the big blue “submit” button
can be pressed to send the patches.</p>
<p>Unlike git-send-email, pyonji will remember for you what the last submitted
version number was (and automatically increment it). pyonji will save the cover
letter so that it’s not lost if the network is flaky and you don’t need to
re-type it for the next submission. pyonji will not waste your time with
uninteresting questions such as “which encoding should I use?”. pyonji will
automatically include the <a href="https://git-scm.com/docs/git-format-patch#_base_tree_information">base tree information</a> in the patches so that any
conflicts are more easily resolved by the reviewer.</p>
<p>Please try it and let me know how it goes! In particular, I’m wondering if the
logic to auto-detect the e-mail server settings are robust enough, or if there
are e-mail providers I don’t handle correctly yet.</p>
<p>There is still a lot to be done to improve pyonji. Setup is painful for GMail
and Fastmail users because app passwords are required. I wanted to use OAuth to
fix this but both of these providers heavily restrict how SMTP OAuth apps can
be registered. Setup doesn’t work for ProtonMail users because the bridge uses
a self-signed certificate, that can be fixed but setup will remain painful.
I’d like to add UI to change the base branch, improve the heuristics to pick a
good default for the base branch, support for the MAINTAINERS file for easier
contribution to big projects such as the kernel, add an easy way to mark a
patch series as RFC, and probably a million of other things.</p>
<p>Apart from pyonji, I’ve been working on some graphics-related stuff as always.
We’re getting closer to the wlroots 0.17 release, fixing the few remaining
blocking issues. A <a href="https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4131">new API</a> to clip surfaces with the
scene-graph has been merged, many thanks to Alexander Orzechowski and Isaac
Freund! I’ve <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26205">fixed a Mesa regression</a> introduced by a previous
patch I’ve reviewed related to EGL and split render/display SoCs (I hate these).
And I’ve been <a href="https://lore.kernel.org/dri-devel/20231109074545.148149-1-contact@emersion.fr/T/#m8f4e6718d387ab509b41a27c82534ba0a4b03ff5">discussing</a> with other kernel developers about a way
to stop (ab)using KMS dumb buffers for split render/display SoCs (I swear I
really hate these). We’re trying to come up with a solution which could on the
long run also help with the Buffer Allocation Constraints Problem (see the
<a href="https://lpc.events/event/9/contributions/615/">XDC 2020 talk</a> for more info).</p>
<p>I’ve written a few patches to add support for OAuth 2.0 refresh tokens to
meta.sr.ht. If you’ve ever used an OAuth sr.ht app (like hottub or yojo to
integrate builds.sr.ht with GitHub or Forgejo), you probably know that tokens
expire after one year, and that you need to redo the setup step when that
happens. This is annoying, and adding support for refresh tokens to meta.sr.ht
and the OAuth apps should fix this.</p>
<p>Last, I’m now part of the <a href="https://www.freedesktop.org/wiki/CodeOfConduct/">FreeDesktop Code of Conduct</a> team. This is not a
technical role, but it’s very important to have folks doing this work. I’ve
attended a Code of Conduct workshop to learn how to do it, that’s been pretty
interesting and helpful. The workshop focused a lot more on trying to change
people’s behavior, instead of bringing down the ban hammer.</p>
<p>That’s all for now, see you next month!</p>2023-11-15T22:00:00+00:00Hari Rana: Rewriting nouveau’s Website
https://tesk.page/2023/11/15/rewriting-nouveaus-website/
<h2 id="introduction">Introduction</h2>
<p>We spent a whole week rewriting nouveau’s website — the drivers for NVIDIA cards. It started as a one-person effort, but it led to a few people helping me out. We addressed several issues in the nouveau website and improved it a lot. The redesign is live on <a href="https://nouveau.freedesktop.org">nouveau.freedesktop.org</a>.</p>
<p>In this article, we’ll go over the problems with the old site and the work we’ve done to fix them.</p>
<h2 id="problems-with-old-website">Problems With Old Website</h2>
<p>I’m going to use this <a href="https://web.archive.org/web/20231107143009/https://nouveau.freedesktop.org">archive</a> as a reference for the old site.</p>
<p>The biggest problem with the old site was that the HTML and CSS were written 15 years ago and have never been updated since. So in 2023, we were relying on outdated HTML/CSS code. Obviously, this was no fun from a reader’s perspective. With the technical debt and lack of interest, we were suffering from several problems. The only good thing about the old site was that it didn’t use JavaScript, which I wanted to keep for the rewrite.</p>
<p>Fun fact: the template was so old that it could be built for browsers that don’t support HTML5!</p>
<h3 id="not-responsive">Not Responsive</h3>
<p>“Responsive design” in web design means making the website accessible on a variety of screen sizes. In practice, a website should adapt to work on mobile devices, tablets, and laptops/computer monitors.</p>
<p>In the case of the nouveau website, it didn’t support mobile screen sizes properly. Buttons were hard to tap and text was small. Here are some screenshots taken in Firefox on my Razer Phone 2:</p>
<figure>
<img alt="" src="https://tesk.page/assets/nouveau-website-1.webp" />
<p>Small buttons and text in the navigation bar that are difficult to read and tap.</p>
</figure>
<figure>
<img alt="" src="https://tesk.page/assets/nouveau-website-2.webp" />
<p>Small text in a table that forces the reader to zoom in.</p>
</figure>
<h3 id="no-dark-style">No Dark Style</h3>
<p>Regardless of style preferences, having a dark style/theme can help people who are sensitive to light and battery life on AMOLED displays. Dark styles are useful for those who absolutely need them.</p>
<h3 id="no-seo">No SEO</h3>
<p>Search Engine Optimization (SEO) is the process of making a website more discoverable on search engines like Google. We use various elements such as title, description, icon, etc. to increase the ranking in search engines.</p>
<p>In the case of nouveau, there were no SEO efforts. If we look at the old nouveau homepage’s <code class="language-plaintext highlighter-rouge"><head></code> element, we get the following:</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><head></span>
<span class="nt"><meta</span> <span class="na">charset=</span><span class="s">"utf-8"</span><span class="nt">></span>
<span class="nt"><title></span>nouveau<span class="nt"></title></span>
<span class="nt"><link</span> <span class="na">rel=</span><span class="s">"stylesheet"</span> <span class="na">href=</span><span class="s">"style.css"</span> <span class="na">type=</span><span class="s">"text/css"</span><span class="nt">></span>
<span class="nt"><link</span> <span class="na">rel=</span><span class="s">"stylesheet"</span> <span class="na">href=</span><span class="s">"xorg.css"</span> <span class="na">type=</span><span class="s">"text/css"</span><span class="nt">></span>
<span class="nt"><link</span> <span class="na">rel=</span><span class="s">"stylesheet"</span> <span class="na">href=</span><span class="s">"local.css"</span> <span class="na">type=</span><span class="s">"text/css"</span><span class="nt">></span>
<span class="nt"><link</span> <span class="na">rel=</span><span class="s">"alternate"</span> <span class="na">type=</span><span class="s">"application/x-wiki"</span> <span class="na">title=</span><span class="s">"Edit this page"</span> <span class="na">href=</span><span class="s">"https://gitlab.freedesktop.org/nouveau/wiki/-/edit/main/sources/index.mdwn"</span><span class="nt">></span>
<span class="nt"></head></span>
</code></pre></div></div>
<p>The only thing there was a title, which is, obviously, far from desirable. The rest were CSS stylesheets, wiki source link, and character set.</p>
<h3 id="readability-issues">Readability Issues</h3>
<p>One of the biggest problems with nouveau’s website (apart from the homepage) is the lack of a maximum width. Large paragraphs stretch across the screen, making it difficult to read.</p>
<h2 id="process-of-rewriting">Process of Rewriting</h2>
<p>Before I started the redesign, I talked to <a href="https://gitlab.com/karolherbst">Karol Herbst</a>, one of the nouveau maintainers. He had been wanting to redesign the nouveau site for ages, so I asked myself, “How hard can it be?” Well… mistakes were made.</p>
<p>The first step was to look at the repository and learn about the tools freedesktop.org uses for their website. freedesktop.org uses <a href="https://ikiwiki.info">ikiwiki</a> to generate the wiki. Problem is: it’s slow and really annoying to work with. The first thing I did was create a Fedora <a href="https://containertoolbx.org">toolbox</a> container. I installed the <code class="language-plaintext highlighter-rouge">ikiwiki</code> package to generate the website locally.</p>
<p>The second step was to rewrite the CSS and HTML template. I took a look at <a href="https://gitlab.freedesktop.org/nouveau/wiki/-/blob/0394b9e0a482f2ba69c3ec798ebe171bda435052/templates/page.tmpl"><code class="language-plaintext highlighter-rouge">page.tmpl</code></a> — the boilerplate. While looking at it, I discovered another problem: the template is unreadable. So I worked on that as well.</p>
<p>I ported to modern HTML elements, like <code class="language-plaintext highlighter-rouge"><nav></code> for the navigation bar, <code class="language-plaintext highlighter-rouge"><main></code> for the main content, and <code class="language-plaintext highlighter-rouge"><footer></code> for the footer.</p>
<p>The third step was to rewrite the CSS. In the <code class="language-plaintext highlighter-rouge"><head></code> tag above, we can see that the site pulls CSS from many sources: <code class="language-plaintext highlighter-rouge">style.css</code>, <code class="language-plaintext highlighter-rouge">xorg.css</code>, and <code class="language-plaintext highlighter-rouge">local.css</code>. So what I did was to delete <code class="language-plaintext highlighter-rouge">xorg.css</code> and <code class="language-plaintext highlighter-rouge">local.css</code>, delete the contents of <code class="language-plaintext highlighter-rouge">style.css</code>, and rewrite it from scratch. I copied a few things from <a href="https://gnome.pages.gitlab.gnome.org/libadwaita">libadwaita</a>, namely its buttons and <a href="https://gnome.pages.gitlab.gnome.org/libadwaita/doc/main/named-colors.html">colors</a>.</p>
<p>And behold… <a href="https://gitlab.freedesktop.org/nouveau/wiki/-/merge_requests/29">merge request !29</a>!</p>
<p>Despite the success of the rewrite, I ran into a few roadblocks. I couldn’t figure out how to make the freedesktop.org logo dark style. Luckily, my friend <a href="https://kramo.hu">kramo</a> helped me out by providing an SVG file of the logo that adapts to dark style, based on <a href="https://upload.wikimedia.org/wikipedia/commons/e/e2/Freedesktop-logo.svg">Wikipedia</a>’s. They also adjusted the style of the website to make it look nicer.</p>
<p>I also couldn’t figure out what to do with the tables because the colors were low contrast. Also, the large table on the <a href="https://nouveau.freedesktop.org/FeatureMatrix.html">Feature Matrix</a> page was limited in maximum width, which would make it uncomfortable on large monitors. <a href="https://ordinary.cafe/@lea">Lea</a> from <a href="https://fyralabs.com">Fyra Labs</a> helped with the tables and fixed the problems. She also adjusted the style.</p>
<p>After that, the rewrite was <em>mostly</em> done. Some reviewers came along and suggested some changes. Karol wanted the rewrite so <a href="https://chaos.social/@karolherbst/111415984079049802">badly</a> that he opened a <a href="https://chaos.social/@karolherbst/111415985859773085">poll</a> asking if he should merge it. It was an overwhelming yes, so… it got merged!</p>
<h2 id="conclusion">Conclusion</h2>
<p>As Karol, puts it:</p>
<figure>
<img alt="" src="https://tesk.page/assets/karol-crying.webp" />
<p>“check out the nouveau repo, then cry, then reconsider your life choices”</p>
</figure>
<p>In all seriousness, I’ve had a great time working on it. While this is the nouveau site in particular, I plan to eventually rewrite the entire freedesktop.org site. However, I started with nouveau because it was hosted on GitLab. Meanwhile, other sites/pages are hosted on freedesktop.org’s cgit instance, which were largely inaccessible for me to contribute to.</p>
<p>Ideally, we’d like to move from ikiwiki to something more modern, like a framework or a better generator, but we’ll have to see who’s willing to work on it and maintain it.</p>2023-11-15T00:00:00+00:00Matthias Klumpp: AppStream 1.0 released!
https://blog.tenstral.net/2023/11/appstream-1-0-released.html
<p>Today, 12 years after the meeting where AppStream was first discussed and 11 years after I released a prototype implementation I am excited to announce <strong>AppStream 1.0</strong>! <img alt="🎉" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f389.png" style="height: 1em;" /><img alt="🎉" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f389.png" style="height: 1em;" /><img alt="🎊" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f38a.png" style="height: 1em;" /></p>
<p>Check it out <a href="https://github.com/ximion/appstream#readme">on GitHub</a>, or <a href="https://www.freedesktop.org/software/appstream/releases/">get the release tarball</a> or <a href="https://www.freedesktop.org/software/appstream/docs/">read the documentation</a> or <a href="https://github.com/ximion/appstream/commit/4be36d5f4a6bc401efc84cd6e2d390a59c304115">release notes</a>! <img alt="😁" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f601.png" style="height: 1em;" /></p>
<h2 class="wp-block-heading">Some nostalgic memories</h2>
<p>I was not in the <a href="https://www.freedesktop.org/wiki/Distributions/Meetings/AppInstaller2011/">original AppStream meeting</a>, since in 2011 I was extremely busy with finals preparations and ball organization in high school, but I still vividly remember sitting at school in the students’ lounge during a break and trying to catch the really choppy live stream from the meeting on my borrowed laptop (a futile exercise, I watched parts of the blurry recording later).</p>
<p>I was extremely passionate about getting software deployment to work better on Linux and to improve the overall user experience, and spent many hours on the <a href="https://www.freedesktop.org/software/PackageKit/">PackageKit</a> IRC channel discussing things with many amazing people like Richard Hughes, Daniel Nicoletti, Sebastian Heinlein and others.</p>
<p>At the time I was writing a software deployment tool called Listaller – this was before Linux containers were a thing, and building it was very tough due to technical and personal limitations (I had just learned C!). Then in university, when I intended to recreate this tool, but for real and better this time as a new project called Limba, I needed a way to provide metadata for it, and AppStream fit right in! Meanwhile, Richard Hughes was tackling the UI side of things while creating GNOME Software and needed a solution as well. So I implemented a prototype and together we pretty much reshaped the early specification from the original meeting into what would become modern AppStream.</p>
<p>Back then I saw AppStream as a necessary side-project for my actual project, and didn’t even consider me as the maintainer of it for quite a while (I hadn’t been at the meeting afterall). All those years ago I had no idea that ultimately I was developing AppStream not for Limba, but for a new thing that would show up later, with an even more modern design called <a href="https://flatpak.org/">Flatpak</a>. I also had no idea how incredibly complex AppStream would become and how many features it would have and how much more maintenance work it would be – and also not how ubiquitous it would become.</p>
<p>The modern Linux desktop uses AppStream everywhere now, it is supported by all major distributions, used by Flatpak for metadata, used for firmware metadata via Richard’s <a href="https://fwupd.org/">fwupd/LVFS</a>, runs on every Steam Deck, can be found in cars and possibly many places I do not know yet.</p>
<h2 class="wp-block-heading">What is new in 1.0?</h2>
<h3 class="wp-block-heading">API breaks</h3>
<p>The most important thing that’s new with the 1.0 release is a bunch of incompatible changes. For the shared libraries, all deprecated API elements have been removed and a bunch of other changes have been made to improve the overall API and especially make it more binding-friendly. That doesn’t mean that the API is completely new and nothing looks like before though, when possible the previous API design was kept and some changes that would have been too disruptive have not been made. Regardless of that, you will have to port your AppStream-using applications. For some larger ones I already submitted patches to build with both AppStream versions, the 0.16.x stable series as well as 1.0+.</p>
<p>For the XML specification, some older compatibility for XML that had no or very few users has been removed as well. This affects for example <code>release</code> elements that reference downloadable data without an <code>artifact</code> block, which has not been supported for a while. For all of these, I checked to remove only things that had close to no users and that were a significant maintenance burden. So as a rule of thumb: If your XML validated with no warnings with the 0.16.x branch of AppStream, it will still be 100% valid with the 1.0 release.</p>
<p>Another notable change is that the generated output of AppStream 1.0 will always be 1.0 compliant, you can not make it generate data for versions below that (this greatly reduced the maintenance cost of the project).</p>
<figure class="wp-block-image size-full"><a href="https://blog.tenstral.net/wp-content/uploads/2023/11/refactoring-code-cat.gif"><img alt="" class="wp-image-2012" height="401" src="https://blog.tenstral.net/wp-content/uploads/2023/11/refactoring-code-cat.gif" width="498" /></a></figure>
<h3 class="wp-block-heading">Developer element</h3>
<p>For a long time, you could set the developer name using the top-level <code>developer_name</code> tag. With AppStream 1.0, this is changed a bit. There is now a <code>developer</code> tag with a <code>name</code> child (that can be translated unless the <code>translate="no"</code> attribute is set on it). This allows future extensibility, and also allows to set a machine-readable <code>id</code> attribute in the <code>developer</code> element. This permits software centers to group software by developer easier, without having to use heuristics. If we decide to extend the developer information per-app in future, this is also now possible. Do not worry though the <code>developer_name</code> tag is also still read, so there is no high pressure to update. The old 0.16.x stable series also has this feature backported, so it can be available everywhere. Check out the <a href="https://www.freedesktop.org/software/appstream/docs/chap-Metadata.html#tag-developer">developer tag specification</a> for more details.</p>
<h3 class="wp-block-heading">Scale factor for screenshots</h3>
<p>Screenshot images can now have a <code>scale</code> attribute, to indicate an (integer) scaling factor to apply. This feature was a breaking change and therefore we could not have it for the longest time, but it is now available. Please wait a bit for AppStream 1.0 to become deployed more widespread though, as using it with older AppStream versions may lead to issues in some cases. Check out the <a href="https://www.freedesktop.org/software/appstream/docs/chap-Metadata.html#tag-screenshots">screenshots tag specification</a> for more details.</p>
<h3 class="wp-block-heading">Screenshot environments</h3>
<p>It is now possible to indicate the environment a screenshot was recorded in (GNOME, GNOME Dark, KDE Plasma, Windows, etc.) via an <code>environment</code> attribute on the respective <code>screenshot</code> tag. This was also a breaking change, so use it carefully for now! If projects want to, they can use this feature to supply dedicated screenshots depending on the environment the application page is displayed in. Check out the <a href="https://www.freedesktop.org/software/appstream/docs/chap-Metadata.html#tag-screenshots">screenshots tag specification</a> for more details.</p>
<h3 class="wp-block-heading">References tag</h3>
<p>This is a feature more important for the scientific community and scientific applications. Using the <code>references</code> tag, you can associate the AppStream component with a DOI (<a href="https://en.wikipedia.org/wiki/Digital_object_identifier">Digital object identifier</a>) or provide a link to a <a href="https://citation-file-format.github.io/">CFF file</a> to provide citation information. It also allows to link to other scientific registries. Check out the <a href="https://www.freedesktop.org/software/appstream/docs/chap-Metadata.html#tag-references">references tag specification</a> for more details.</p>
<h3 class="wp-block-heading">Release tags</h3>
<p>Releases can have <a href="https://www.freedesktop.org/software/appstream/docs/sect-Metadata-Releases.html#tag-release-tags">tags</a> now, just like components. This is generally not a feature that I expect to be used much, but in certain instances it can become useful with a cooperating software center, for example to tag certain releases as long-term supported versions.</p>
<h3 class="wp-block-heading">Multi-platform support</h3>
<p>Thanks to the interest and work of many volunteers, AppStream (mostly) runs on FreeBSD now, a NetBSD port exists, support for macOS was written and a Windows port is on its way! Thank you to everyone working on this <img alt="🙂" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f642.png" style="height: 1em;" /></p>
<h3 class="wp-block-heading">Better compatibility checks</h3>
<p>For a long time I thought that the AppStream library should just be a thin layer above the XML and that software centers should just implement a lot of the actual logic. This has not been the case for a while, but there was still a lot of complex AppStream features that were hard for software centers to implement and where it makes sense to have one implementation that projects can just use.</p>
<p>The validation of component relations is one such thing. This was implemented in 0.16.x as well, but 1.0 vastly improves upon the compatibility checks, so you can now just run <a href="https://www.freedesktop.org/software/appstream/docs/api/method.Component.check_relations.html">as_component_check_relations</a> and retrieve a detailed list of whether the current component will run well on the system. Besides better API for software developers, the <code>appstreamcli</code> utility also has much improved support for relation checks, and I wrote about these changes <a href="https://blog.tenstral.net/2023/10/how-to-indicate-device-compatibility-for-your-app-in-metainfo-data.html">in a previous post</a>. Check it out!</p>
<p>With these changes, I hope this feature will be used much more, and beyond just drivers and firmware.</p>
<h3 class="wp-block-heading">So much more!</h3>
<p>The changelog for the 1.0 release is huge, and there are many papercuts resolved and changes made that I did not talk about here, like us using gi-docgen (instead of gtkdoc) now for <a href="https://www.freedesktop.org/software/appstream/docs/api/">nice API documentation</a>, or the many improvements that went into better binding support, or better search, or just plain bugfixes.</p>
<h2 class="wp-block-heading">Outlook</h2>
<p>I expect the transition to 1.0 to take a bit of time. AppStream has not broken its API for many, many years (since 2016), so a bunch of places need to be touched even if the changes themselves are minor in many cases. In hindsight, I should have also released 1.0 much sooner and it should not have become such a mega-release, but that was mainly due to time constraints.</p>
<p>So, what’s in it for the future? Contrary to what I thought, AppStream does not really seem to be “done” and fetature complete at a point, there is always something to improve, and people come up with new usecases all the time. So, expect more of the same in future: Bugfixes, validator improvements, documentation improvements, better tools and the occasional new feature.</p>
<p>Onwards to 1.0.1! <img alt="😁" class="wp-smiley" src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f601.png" style="height: 1em;" /></p>
<p></p>2023-11-11T19:48:00+00:00Peter Hutterer: PSA: For Xorg GNOME sessions, use the xf86-input-wacom driver for your tablets
http://who-t.blogspot.com/2023/11/psa-for-xorg-gnome-sessions-use-xf86.html
<p>
TLDR: see the title of this blog post, it's really that trivial.
</p>
<p>
Now that <strike>Godot</strike>Wayland has been coming for ages and all new development focuses on a pile of software
that steams significantly less, we're seeing cracks appear in the old Xorg support. Not intentionally,
but there's only so much time that can be spent on testing and things that are more niche fall through.
One of these was a bug I just had the pleasure of debugging and was triggered by GNOME on Xorg user using the xf86-input-libinput driver for tablet devices.
</p>
<p>
On the surface of it, this should be fine because libinput (and thus xf86-input-libinput) handles tablets just fine. But libinput is the new kid on the block.
The old kid on said block is the xf86-input-wacom driver, older than libinput by slightly over a decade. And oh man,
history has baked things into the driver that are worse than raisins in apple strudel [1].
</p>
<p>
The xf86-input-libinput driver was written as a wrapper around libinput and makes use of fancy things that (from libinput's POV) have always been around: things like
input device hotplugging. Fancy, I know. For tablet devices the driver creates an
X device for each new tool as it comes into proximity first. Future events from that tool will go through that device. A second tool, be it a new pen or the eraser on the original pen, will create a
second X device and events from that tool will go through that X device. Configuration on any device will thus only affect that particular pen.
Almost like the whole thing makes sense.
</p>
<p>
The wacom driver of course doesn't do this. It pre-creates X devices for some possible types of tools (pen, eraser, and cursor [2] but not airbrush or artpen). When a tool
goes into proximity the events are sent through the respective device, i.e. all pens go through the pen tool, all erasers through the eraser tool.
To actually track pens there is the "Wacom Serial IDs" property that contains the current tool's serial number. If you want to
track multiple tools you need to query the property on proximity in [4]. At the time this was within a reasonable error margin of a good idea.
</p>
<p>
Of course and because MOAR CONFIGURATION! will save us all from the great filter you can specify the "ToolSerials" xorg.conf option as
e.g. "airbrush;12345;artpen" and get some extra X devices pre-created, in this case a airbrush and artpen X device and an
X device just for the tool with the serial number 12345. All other tools multiplex through the default devices. Again, at the time this was a great improvement. [5]
</p>
<p>
Anyway, where was I? Oh, right. The above should serve as a good approximation of a reason why the xf86-input-libinput driver does
not try to be fullly compatible to the xf86-input-wacom driver. In everyday use these things barely matter [6] but for the desktop
environment which needs to configure these devices all these differences mean multiple code paths. Those paths need to be tested but they aren't,
so things fall through the cracks.
</p>
<p>
So quite a while ago, we made the decision that until Xorg goes dodo, the xf86-input-wacom driver is the tablet driver to use in GNOME.
So if you're using a GNOME on Xorg session [7], do make sure the xf86-input-wacom driver is installed. It will make both of us happier and that's a good aim to strive for.
</p>
<p>
<small>
[1] It's just a joke. Put the pitchforks down already.<br />
[2] The cursor is the mouse-like thing Wacom sells. Which is called cursor [3] because the English language has a limited vocabulary and we need to re-use words as much as possible lest we run out of them.<br />
[3] It's also called puck. Because [2].<br />
[4] And by "query" I mean "wait for the XI2 event notifying you of a property change". Because of lolz the driver cannot update the property on proximity in but needs to schedule that as idle func so the
property update for the serial always arrives at some unspecified time after the proximity in but hopefully before more motion events happen. Or not, and that's how hope dies.<br />
[5] Think about this next time someone says they long for some unspecified good old days.<br />
[6] Except the strip axis which on the wacom driver is actually a bit happily moving left/right as your finger moves up/down on the touch strip and any X client needs to know this. libinput normalizes this to...well, a normal value but now the X client needs to know which driver is running so, oh deary deary.<br />
[7] e.g because your'e stockholmed into it by your graphics hardware<br />
</small>
</p>2023-11-10T03:22:46+00:00Simon Ser: Compiling the mainline kernel on a Raspberry Pi running under Arch Linux ARM
https://emersion.fr/blog/2023/raspberry-pi-archlinux-mainline-compilation/
<p>I’ve recently worked on <a href="https://lore.kernel.org/dri-devel/20231109074545.148149-1-contact@emersion.fr/T/">a patch</a> for the vc4 display driver
used on the Raspberry Pi 4. To test this patch, I needed to compile the kernel
and install it, something I know how to do on x86 but not on Raspberry Pi.
Because I’m pretty stubborn I’ve also insisted on making my life harder:</p>
<ul>
<li>I installed Arch Linux ARM as the base system, instead of Raspberry Pi OS or
Raspbian.</li>
<li>I based my patches on top of the mainline kernel, instead of using
<a href="https://github.com/raspberrypi/linux">Raspberry Pi’s tree</a>.</li>
<li>I wanted to install my built kernel alongside the one provided by the
distribution, instead of overwriting it.</li>
</ul>
<p>Raspberry Pi has an <a href="https://www.raspberrypi.com/documentation/computers/linux_kernel.html">official guide to compile the kernel</a>, however it assumes
Raspberry Pi OS, Raspberry Pi’s kernel tree, and overwrites the current kernel.
It was still very useful to get an idea of the process. Still, quite a few
adaptations have been required. This blog post serves as my personal notepad to
remember how to Do It.</p>
<p>First, the official guide instructs us to run <code>make bcm2711_defconfig</code> to
generate the kernel config, however mainline complains with:</p>
<pre><code>Can't find default configuration "arch/arm/configs/bcm2711_defconfig"
</code></pre>
<p>This can be fixed by grabbing this file from the Raspberry Pi tree:</p>
<pre><code>curl -L -o arch/arm/configs/bcm2711_defconfig "https://github.com/raspberrypi/linux/raw/rpi-6.1.y/arch/arm/configs/bcm2711_defconfig"
</code></pre>
<p>Once that’s done, compiling the kernel as usual works fine. Then we need to
install it to the <code>/boot</code> partition. We can ignore the overlays stuff from the
official guide, we don’t use these. The source paths need to be slightly
adjusted, and the destination paths need to be fixed up to use a subdirectory:</p>
<pre><code>doas make modules_install
doas cp arch/arm/boot/dts/broadcom/*.dtb /boot/custom/
doas cp arch/arm/boot/zImage /boot/custom/kernel7.img
</code></pre>
<p>Then we need to generate an initramfs. At first I forgot to do that step and
the kernel was hanging around USB bus discovery.</p>
<pre><code>doas mkinitcpio --generate /boot/custom/initramfs-linux.img --kernel /boot/custom/kernel7.img
</code></pre>
<p>The last step is updating the boot firmware configuration located at
<code>/boot/config.txt</code>. Comment out any <code>dtoverlay</code> directive, then add
<code>os_prefix=custom/</code> to point the firmware to our subdirectory (note, the final
slash is important).</p>
<p>For some reason my memory card was showing up as <code>/dev/mmcblk1</code> instead of
<code>/dev/mmcblk0</code>, so I had to <del>bang my head against the wall until I notice the
difference</del> adjust <code>/boot/cmdline.txt</code> and <code>/etc/fstab</code> accordingly.</p>
<p>That’s it! After a reboot I was ready to start kernel hacking. Thanks to Maíra
Canal for replying to my distress signal on Mastodon and providing
recommendations!</p>2023-11-08T22:00:00+00:00Melissa Wen: AMD Driver-specific Properties for Color Management on Linux (Part 2)
https://melissawen.github.io/blog/2023/11/07/amd-steamdeck-colors-p2
<h2 id="tldr">TL;DR:</h2>
<p>This blog post explores the color capabilities of AMD hardware and how they are
exposed to userspace through driver-specific properties. It discusses the
different color blocks in the AMD Display Core Next (DCN) pipeline and their
capabilities, such as predefined transfer functions, 1D and 3D lookup tables
(LUTs), and color transformation matrices (CTMs). It also highlights the
differences in AMD HW blocks for pre and post-blending adjustments, and how these
differences are reflected in the available driver-specific properties.</p>
<p>Overall, this blog post provides a comprehensive overview of the color
capabilities of AMD hardware and how they can be controlled by userspace
applications through driver-specific properties. This information is valuable
for anyone who wants to develop applications that can take advantage of the AMD
color management pipeline.</p>
<p>Get a closer look at each hardware block’s capabilities, unlock a wealth of
knowledge about AMD display hardware, and enhance your understanding of
graphics and visual computing. Stay tuned for future developments as we embark
on a quest for GPU color capabilities in the ever-evolving realm of rainbow
treasures.</p>
<hr />
<p>Operating Systems can use the power of GPUs to ensure consistent color reproduction
across graphics devices. We can use GPU-accelerated color management to manage the
diversity of color profiles, do color transformations to convert between
High-Dynamic-Range (HDR) and Standard-Dynamic-Range (SDR) content and color
enhacements for wide color gamut (WCG). However, to make use of GPU display
capabilities, we need an interface between userspace and the kernel display drivers
that is currently absent in the Linux/DRM KMS API.</p>
<p>In the previous <a href="https://melissawen.github.io/blog/2023/08/21/amd-steamdeck-colors">blog post</a>
I presented how we are expanding the Linux/DRM color management API to expose
specific properties of AMD hardware. Now, I’ll guide you to the color features
for the Linux/AMD display driver. We embark on a journey through DRM/KMS, AMD
Display Manager, and AMD Display Core and delve into the color blocks to
uncover the secrets of color manipulation within AMD hardware. Here we’ll talk
less about the color tools and more about where to find them in the hardware.</p>
<p>We resort to driver-specific properties to reach AMD hardware blocks with color
capabilities. These blocks display features like predefined transfer functions,
color transformation matrices, and 1-dimensional (1D LUT) and 3-dimensional
lookup tables (3D LUT). Here, we will understand how these color features are
strategically placed into color blocks both before and after blending in
Display Pipe and Plane (DPP) and Multiple Pipe/Plane Combined (MPC) blocks.</p>
<p>That said, welcome back to the second part of our thrilling journey through
AMD’s color management realm!</p>
<h2 id="amd-display-driver-in-the-linuxdrm-subsystem-the-journey">AMD Display Driver in the Linux/DRM Subsystem: The Journey</h2>
<p>In my 2022 XDC talk <a href="https://www.youtube.com/watch?v=CMm-yhsMB7U">“I’m not an AMD expert, but…”</a>,
I briefly explained the organizational structure of the Linux/AMD display driver where
the driver code is bifurcated into a Linux-specific section and a shared-code
portion. To reveal AMD’s color secrets through the Linux kernel DRM API, our
journey led us through these layers of the Linux/AMD display driver’s software
stack. It includes traversing the DRM/KMS framework, the AMD Display Manager
(DM), and the AMD Display Core (DC)
<a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/index.html">[1]</a>.</p>
<p><img alt="" src="https://raw.githubusercontent.com/melissawen/melissawen.github.io/master/img/drm-amd-driver.svg" /></p>
<p>The DRM/KMS framework provides the atomic API for color management through KMS
properties represented by <code class="language-plaintext highlighter-rouge">struct drm_property</code>. We extended the color
management interface exposed to userspace by leveraging existing resources and
connecting them with driver-specific functions for managing modeset properties.</p>
<p>On the AMD DC layer, the interface with hardware color blocks is established.
The AMD DC layer contains OS-agnostic components that are shared across
different platforms, making it an invaluable resource. This layer already
implements hardware programming and resource management, simplifying the external
developer’s task. While examining the DC code, we gain insights into the color
pipeline and capabilities, even without direct access to specifications.
Additionally, AMD developers provide essential support by answering queries and
reviewing our work upstream.</p>
<p>The primary challenge involved identifying and understanding relevant AMD DC
code to configure each color block in the color pipeline. However, the ultimate
goal was to bridge the DC color capabilities with the DRM API. For this, we
changed the AMD DM, the OS-dependent layer connecting the
DC interface to the DRM/KMS framework. We defined and managed driver-specific
color properties, facilitated the transport of user space data to the DC, and
translated DRM features and settings to the DC interface. Considerations were
also made for differences in the color pipeline based on hardware capabilities.</p>
<h2 id="exploring-color-capabilities-of-the-amd-display-hardware">Exploring Color Capabilities of the AMD display hardware</h2>
<p>Now, let’s dive into the exciting realm of AMD color capabilities, where a
abundance of techniques and tools await to make your colors look extraordinary
across diverse devices.</p>
<p>First, we need to know a little about the color transformation and calibration
tools and techniques that you can find in different blocks of the AMD hardware.
I borrowed some images from
<a href="https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-quality-rendering/chapter-24-using-lookup-tables-accelerate-color">[2]</a>
<a href="https://developer.nvidia.com/gpugems/gpugems3/part-iv-image-effects/chapter-24-importance-being-linear">[3]</a>
<a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/dcn-overview.html">[4]</a>
to help you understand the information.</p>
<h4 id="predefined-transfer-functions-named-fixed-curves">Predefined Transfer Functions (Named Fixed Curves):</h4>
<p>Transfer functions serve as the bridge between the digital and visual worlds,
defining the mathematical relationship between digital color values and linear
scene/display values and ensuring consistent color reproduction across different
devices and media. You can learn more about curves in the chapter
<a href="https://developer.nvidia.com/gpugems/gpugems3/part-iv-image-effects/chapter-24-importance-being-linear">GPU Gems 3 - The Importance of Being Linear by Larry Gritz and Eugene d’Eon</a>.</p>
<p>ITU-R 2100 introduces three main types of transfer functions:</p>
<blockquote>
<ul>
<li>OETF: the opto-electronic transfer function, which converts linear scene light into the video signal, typically within a camera.</li>
<li>EOTF: electro-optical transfer function, which converts the video signal into the linear light output of the display.</li>
<li>OOTF: opto-optical transfer function, which has the role of applying the “rendering intent”.</li>
</ul>
</blockquote>
<p>AMD’s display driver supports the following pre-defined transfer functions (aka
named fixed curves):</p>
<blockquote>
<ul>
<li>Linear/Unity: linear/identity relationship between pixel value and luminance value;</li>
<li>Gamma 2.2, Gamma 2.4, Gamma 2.6: pure power functions;</li>
<li>sRGB: 2.4: The piece-wise transfer function from IEC 61966-2-1:1999;</li>
<li>BT.709: has a linear segment in the bottom part and then a power function with a 0.45 (~1/2.22) gamma for the rest of the range; standardized by ITU-R BT.709-6;</li>
<li>PQ (Perceptual Quantizer): used for HDR display, allows luminance range capability of 0 to 10,000 nits; standardized by SMPTE ST 2084.</li>
</ul>
</blockquote>
<p>These capabilities vary depending on the hardware block, with some utilizing
hardcoded curves and others relying on AMD’s color module to construct curves
from standardized coefficients. It also supports user/custom curves built from
a lookup table.</p>
<h4 id="1d-luts-1-dimensional-lookup-table">1D LUTs (1-dimensional Lookup Table):</h4>
<p>A 1D LUT is a versatile tool, defining a one-dimensional color transformation
based on a single parameter. It’s very well explained by Jeremy Selan at
<a href="https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-quality-rendering/chapter-24-using-lookup-tables-accelerate-color">GPU Gems 2 - Chapter 24 Using Lookup Tables to Accelerate Color Transformations</a></p>
<p>It enables adjustments to color, brightness, and contrast, making it
ideal for fine-tuning. In the Linux AMD display driver, the atomic API offers a
1D LUT with 4096 entries and 8-bit depth, while legacy gamma uses a size of
256.</p>
<h4 id="3d-luts-3-dimensional-lookup-table">3D LUTs (3-dimensional Lookup Table):</h4>
<p>These tables work in three dimensions – red, green, and blue. They’re perfect for
complex color transformations and adjustments between color channels. It’s also
more complex to manage and require more computational resources. Jeremy also explains 3D LUT at
<a href="https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-quality-rendering/chapter-24-using-lookup-tables-accelerate-color">GPU Gems 2 - Chapter 24 Using Lookup Tables to Accelerate Color Transformations</a></p>
<h4 id="ctm-color-transformation-matrices">CTM (Color Transformation Matrices):</h4>
<p>Color transformation matrices facilitate the transition between different color
spaces, playing a crucial role in color space conversion.</p>
<h4 id="hdr-multiplier">HDR Multiplier:</h4>
<p>HDR multiplier is a factor applied to the color values of an image to increase their overall brightness.</p>
<h2 id="amd-color-capabilities-in-the-hardware-pipeline">AMD Color Capabilities in the Hardware Pipeline</h2>
<p>First, let’s take a closer look at the AMD Display Core Next hardware pipeline in
<a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/dcn-overview.html">the Linux kernel documentation for AMDGPU driver - Display Core Next</a></p>
<p><a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/dcn-overview.html" title="Linux Kernel Documentation - GPU Driver Documentation > AMDGPU driver > Display Core Next (DCN)"><img alt="" src="https://github.com/melissawen/melissawen.github.io/blob/master/img/dc_pipeline_overview_2.png?raw=true" /></a></p>
<p>In the AMD Display Core Next hardware pipeline, we encounter two hardware
blocks with color capabilities: the Display Pipe and Plane (DPP) and the
Multiple Pipe/Plane Combined (MPC). The DPP handles color adjustments per plane
before blending, while the MPC engages in post-blending color adjustments.
In short, we expect DPP color capabilities to match up with DRM plane
properties, and MPC color capabilities to play nice with DRM CRTC properties.</p>
<p><em>Note: here’s the catch – there are some DRM CRTC color transformations that
don’t have a corresponding AMD MPC color block, and vice versa. It’s like a
puzzle, and we’re here to solve it!</em></p>
<h3 id="amd-color-blocks-and-capabilities">AMD Color Blocks and Capabilities</h3>
<p>We can finally talk about the color capabilities of each AMD color block. As it
varies based on the generation of hardware, let’s take the DCN3+ family as
reference. What’s possible to do before and after blending depends on hardware
capabilities describe in the kernel driver by <a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/display-manager.html#c.dpp_color_caps"><code class="language-plaintext highlighter-rouge">struct
dpp_color_caps</code></a>
and <a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/display-manager.html#c.mpc_color_caps"><code class="language-plaintext highlighter-rouge">struct
mpc_color_caps</code></a>.</p>
<p>The AMD Steam Deck hardware provides a tangible example of these capabilities.
Therefore, we take SteamDeck/DCN301 driver as an example and look at the “Color
pipeline capabilities” described in the file:
<a href="https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c"><code class="language-plaintext highlighter-rouge">driver/gpu/drm/amd/display/dcn301/dcn301_resources.c</code></a></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/* Color pipeline capabilities */
dc->caps.color.dpp.dcn_arch = 1; // If it is a Display Core Next (DCN): yes. Zero means DCE.
dc->caps.color.dpp.input_lut_shared = 0;
dc->caps.color.dpp.icsc = 1; // Intput Color Space Conversion (CSC) matrix.
dc->caps.color.dpp.dgam_ram = 0; // The old degamma block for degamma curve (hardcoded and LUT). `Gamma correction` is the new one.
dc->caps.color.dpp.dgam_rom_caps.srgb = 1; // sRGB hardcoded curve support
dc->caps.color.dpp.dgam_rom_caps.bt2020 = 1; // BT2020 hardcoded curve support (seems not actually in use)
dc->caps.color.dpp.dgam_rom_caps.gamma2_2 = 1; // Gamma 2.2 hardcoded curve support
dc->caps.color.dpp.dgam_rom_caps.pq = 1; // PQ hardcoded curve support
dc->caps.color.dpp.dgam_rom_caps.hlg = 1; // HLG hardcoded curve support
dc->caps.color.dpp.post_csc = 1; // CSC matrix
dc->caps.color.dpp.gamma_corr = 1; // New `Gamma Correction` block for degamma user LUT;
dc->caps.color.dpp.dgam_rom_for_yuv = 0;
dc->caps.color.dpp.hw_3d_lut = 1; // 3D LUT support. If so, it's always preceded by a shaper curve.
dc->caps.color.dpp.ogam_ram = 1; // `Blend Gamma` block for custom curve just after blending
// no OGAM ROM on DCN301
dc->caps.color.dpp.ogam_rom_caps.srgb = 0;
dc->caps.color.dpp.ogam_rom_caps.bt2020 = 0;
dc->caps.color.dpp.ogam_rom_caps.gamma2_2 = 0;
dc->caps.color.dpp.ogam_rom_caps.pq = 0;
dc->caps.color.dpp.ogam_rom_caps.hlg = 0;
dc->caps.color.dpp.ocsc = 0;
dc->caps.color.mpc.gamut_remap = 1; // Post-blending CTM (pre-blending CTM is always supported)
dc->caps.color.mpc.num_3dluts = pool->base.res_cap->num_mpc_3dlut; // Post-blending 3D LUT (preceded by shaper curve)
dc->caps.color.mpc.ogam_ram = 1; // Post-blending regamma.
// No pre-defined TF supported for regamma.
dc->caps.color.mpc.ogam_rom_caps.srgb = 0;
dc->caps.color.mpc.ogam_rom_caps.bt2020 = 0;
dc->caps.color.mpc.ogam_rom_caps.gamma2_2 = 0;
dc->caps.color.mpc.ogam_rom_caps.pq = 0;
dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
dc->caps.color.mpc.ocsc = 1; // Output CSC matrix.
</code></pre></div></div>
<p>I included some inline comments in each element of the color caps to quickly
describe them, but you can find the same information in the Linux kernel
documentation. See more in
<a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/display-manager.html#c.dpp_color_caps"><code class="language-plaintext highlighter-rouge">struct dpp_color_caps</code></a>,
<a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/display-manager.html#c.mpc_color_caps"><code class="language-plaintext highlighter-rouge">struct mpc_color_caps</code></a>
and <a href="https://dri.freedesktop.org/docs/drm/gpu/amdgpu/display/display-manager.html#c.rom_curve_caps"><code class="language-plaintext highlighter-rouge">struct rom_curve_caps</code></a>.</p>
<p>Now, using this guideline, we go through color capabilities of DPP and MPC blocks and talk more
about mapping driver-specific properties to corresponding color blocks.</p>
<h2 id="dpp-color-pipeline-before-blending-per-plane">DPP Color Pipeline: Before Blending (Per Plane)</h2>
<p>Let’s explore the capabilities of DPP blocks and what you can achieve with a
color block. The very first thing to pay attention is the display architecture
of the display hardware: previously AMD uses a display architecture called DCE</p>
<ul>
<li>Display and Compositing Engine, but newer hardware follows DCN - Display Core
Next.</li>
</ul>
<p><em>The architectute is described by: <code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.dcn_arch</code></em></p>
<h3 id="amd-plane-degamma-tf-and-1d-lut">AMD Plane Degamma: TF and 1D LUT</h3>
<p><em>Described by: <code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.dgam_ram</code>, <code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.dgam_rom_caps</code>,<code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.gamma_corr</code></em></p>
<p>AMD Plane Degamma data is mapped to the initial stage of the DPP pipeline. It
is utilized to transition from scanout/encoded values to linear values for
arithmetic operations. Plane Degamma supports both pre-defined transfer
functions and 1D LUTs, depending on the hardware generation. DCN2 and older
families handle both types of curve in the Degamma RAM block
(<code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.dgam_ram</code>); DCN3+ separate hardcoded curves and 1D LUT
into two block: Degamma ROM (<code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.dgam_rom_caps</code>) and Gamma
correction block (<code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.gamma_corr</code>), respectively.</p>
<p>Pre-defined transfer functions:</p>
<ul>
<li>they are hardcoded curves (read-only memory - ROM);</li>
<li>supported curves: sRGB EOTF, BT.709 inverse OETF, PQ EOTF and HLG OETF, Gamma
2.2, Gamma 2.4 and Gamma 2.6 EOTF.</li>
</ul>
<p>The 1D LUT currently accepts 4096 entries of 8-bit. The data is interpreted as an array
of <code class="language-plaintext highlighter-rouge">struct drm_color_lut</code> elements. Setting TF = Identity/Default and LUT as
NULL means bypass.</p>
<p>References:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-5-mwen@igalia.com/">[PATCH v3 04/32] drm/amd/display: add driver-specific property for plane degamma LUT</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-6-mwen@igalia.com/">[PATCH v3 05/32] drm/amd/display: add plane degamma TF driver-specific property</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-20-mwen@igalia.com/">[PATCH v3 19/32] drm/amd/display: add plane degamma TF and LUT support</a></li>
</ul>
<h3 id="amd-plane-3x4-ctm-color-transformation-matrix">AMD Plane 3x4 CTM (Color Transformation Matrix)</h3>
<p>AMD Plane CTM data goes to the DPP Gamut Remap block, supporting a 3x4 fixed
point (s31.32) matrix for color space conversions. The data is interpreted as a
<code class="language-plaintext highlighter-rouge">struct drm_color_ctm_3x4</code>. Setting NULL means bypass.</p>
<p>References:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-31-mwen@igalia.com/">[PATCH v3 30/32] drm/amd/display: add plane CTM driver-specific property</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-32-mwen@igalia.com/">[PATCH v3 31/32] drm/amd/display: add plane CTM support</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-33-mwen@igalia.com/">[PATCH v3 32/32] drm/amd/display: Add 3x4 CTM support for plane CTM</a></li>
</ul>
<h3 id="amd-plane-shaper-tf--1d-lut">AMD Plane Shaper: TF + 1D LUT</h3>
<p><em>Described by: <code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.hw_3d_lut</code></em></p>
<p>The Shaper block fine-tunes color adjustments before applying the 3D LUT,
optimizing the use of the limited entries in each dimension of the 3D LUT. On
AMD hardware, a 3D LUT always means a preceding shaper 1D LUT used for
delinearizing and/or normalizing the color space before applying a 3D LUT, so
this entry on DPP color caps <code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.hw_3d_lut</code> means support for
both shaper 1D LUT and 3D LUT.</p>
<p>Pre-defined transfer function enables delinearizing content with or without
shaper LUT, where AMD color module calculates the resulted shaper curve. Shaper
curves go from linear values to encoded values. If we are already in a
non-linear space and/or don’t need to normalize values, we can set a Identity TF
for shaper that works similar to bypass and is also the default TF value.</p>
<p>Pre-defined transfer functions:</p>
<ul>
<li>there is no DPP Shaper ROM. Curves are calculated by AMD color modules. Check
<code class="language-plaintext highlighter-rouge">calculate_curve()</code> function in the file
<a href="https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/amd/display/modules/color/color_gamma.c"><code class="language-plaintext highlighter-rouge">amd/display/modules/color/color_gamma.c</code></a>.</li>
<li>supported curves: Identity, sRGB inverse EOTF, BT.709 OETF, PQ inverse EOTF,
HLG OETF, and Gamma 2.2, Gamma 2.4, Gamma 2.6 inverse EOTF.</li>
</ul>
<p>The 1D LUT currently accepts 4096 entries of 8-bit. The data is interpreted as an array
of <code class="language-plaintext highlighter-rouge">struct drm_color_lut</code> elements. When setting Plane Shaper TF (!= Identity)
and LUT at the same time, the color module will combine the pre-defined TF and
the custom LUT values into the LUT that’s actually programmed. Setting TF =
Identity/Default and LUT as NULL works as bypass.</p>
<p>References:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-11-mwen@igalia.com/">[PATCH v3 10/32] drm/amd/display: add plane shaper LUT and TF</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-24-mwen@igalia.com/">[PATCH v3 23/32] drm/amd/display: add plane shaper LUT support</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-25-mwen@igalia.com/">[PATCH v3 24/32] drm/amd/display: add plane shaper TF support</a></li>
</ul>
<h3 id="amd-plane-3d-lut">AMD Plane 3D LUT</h3>
<p><em>Described by: <code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.hw_3d_lut</code></em></p>
<p>The 3D LUT in the DPP block facilitates complex color transformations and
adjustments. 3D LUT is a three-dimensional array where each element is an RGB
triplet. As mentioned before, the <code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.hw_3d_lut</code> describe if
DPP 3D LUT is supported.</p>
<p>The AMD driver-specific property advertise the size of a single dimension via
<code class="language-plaintext highlighter-rouge">LUT3D_SIZE</code> property. Plane 3D LUT is a blog property where the data is interpreted
as an array of <code class="language-plaintext highlighter-rouge">struct drm_color_lut</code> elements and the number of entries is
<code class="language-plaintext highlighter-rouge">LUT3D_SIZE</code> cubic. The array contains samples from the approximated function.
Values between samples are estimated by tetrahedral interpolation
The array is accessed with three indices, one for each input dimension (color
channel), blue being the outermost dimension, red the innermost. This
distribution is better visualized when examining the code in
<a href="https://lore.kernel.org/dri-devel/20221004211451.1475215-6-alex.hung@amd.com/">[RFC PATCH 5/5] drm/amd/display: Fill 3D LUT from userspace by Alex Hung</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>+ for (nib = 0; nib < 17; nib++) {
+ for (nig = 0; nig < 17; nig++) {
+ for (nir = 0; nir < 17; nir++) {
+ ind_lut = 3 * (nib + 17*nig + 289*nir);
+
+ rgb_area[ind].red = rgb_lib[ind_lut + 0];
+ rgb_area[ind].green = rgb_lib[ind_lut + 1];
+ rgb_area[ind].blue = rgb_lib[ind_lut + 2];
+ ind++;
+ }
+ }
+ }
</code></pre></div></div>
<p>In our driver-specific approach we opted to advertise it’s behavior to the userspace
instead of implicitly dealing with it in the kernel driver.
AMD’s hardware supports 3D LUTs with 17-size or 9-size (4913 and 729 entries
respectively), and you can choose between 10-bit or 12-bit. In the current
driver-specific work we focus on enabling only 17-size 12-bit 3D LUT, as in
<a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-26-mwen@igalia.com/">[PATCH v3 25/32] drm/amd/display: add plane 3D LUT support</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>+ /* Stride and bit depth are not programmable by API yet.
+ * Therefore, only supports 17x17x17 3D LUT (12-bit).
+ */
+ lut->lut_3d.use_tetrahedral_9 = false;
+ lut->lut_3d.use_12bits = true;
+ lut->state.bits.initialized = 1;
+ __drm_3dlut_to_dc_3dlut(drm_lut, drm_lut3d_size, &lut->lut_3d,
+ lut->lut_3d.use_tetrahedral_9,
+ MAX_COLOR_3DLUT_BITDEPTH);
</code></pre></div></div>
<p>A refined control of 3D LUT parameters should go through a follow-up version or generic API.</p>
<p>Setting 3D LUT to NULL means bypass.</p>
<p>References:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-10-mwen@igalia.com/">[PATCH v3 09/32] drm/amd/display: add plane 3D LUT driver-specific properties</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-26-mwen@igalia.com/">[PATCH v3 25/32] drm/amd/display: add plane 3D LUT support</a></li>
</ul>
<h3 id="amd-plane-blendout-gamma-tf--1d-lut">AMD Plane Blend/Out Gamma: TF + 1D LUT</h3>
<p><em>Described by: <code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.ogam_ram</code></em></p>
<p>The Blend/Out Gamma block applies the final touch-up before blending, allowing
users to linearize content after 3D LUT and just before the blending. It supports both 1D LUT
and pre-defined TF. We can see Shaper and Blend LUTs as 1D LUTs that are
sandwich the 3D LUT. So, if we don’t need 3D LUT transformations, we may want
to only use Degamma block to linearize and skip Shaper, 3D LUT and Blend.</p>
<p>Pre-defined transfer function:</p>
<ul>
<li>there is no DPP Blend ROM. Curves are calculated by AMD color modules;</li>
<li>supported curves: Identity, sRGB EOTF, BT.709 inverse OETF, PQ EOTF, HLG
inverse OETF, and Gamma 2.2, Gamma 2.4, Gamma 2.6 EOTF.</li>
</ul>
<p>The 1D LUT currently accepts 4096 entries of 8-bit. The data is interpreted as an array
of <code class="language-plaintext highlighter-rouge">struct drm_color_lut</code> elements. If <code class="language-plaintext highlighter-rouge">plane_blend_tf_property</code> != Identity TF,
AMD color module will combine the user LUT values with pre-defined TF into the
LUT parameters to be programmed. Setting TF = Identity/Default and LUT to NULL
means bypass.</p>
<p>References:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-12-mwen@igalia.com/">[PATCH v3 11/32] drm/amd/display: add plane blend</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-28-mwen@igalia.com/">[PATCH v3 27/32] drm/amd/display: add plane blend LUT and TF support </a></li>
</ul>
<h2 id="mpc-color-pipeline-after-blending-per-crtc">MPC Color Pipeline: After Blending (Per CRTC)</h2>
<h3 id="drm-crtc-degamma-1d-lut">DRM CRTC Degamma 1D LUT</h3>
<p>The degamma lookup table (LUT) for converting framebuffer pixel data before
apply the color conversion matrix. The data is interpreted as an array of
<code class="language-plaintext highlighter-rouge">struct drm_color_lut</code> elements. Setting NULL means bypass.</p>
<p>Not really supported. The driver is currently reusing the DPP degamma LUT block
(<code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.dgam_ram</code> and <code class="language-plaintext highlighter-rouge">dc->caps.color.dpp.gamma_corr</code>) for
supporting DRM CRTC Degamma LUT, as explaning by <a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-21-mwen@igalia.com/">[PATCH v3 20/32]
drm/amd/display: reject atomic commit if setting both plane and CRTC
degamma</a>.</p>
<h3 id="drm-crtc-3x3-ctm">DRM CRTC 3x3 CTM</h3>
<p><em>Described by: <code class="language-plaintext highlighter-rouge">dc->caps.color.mpc.gamut_remap</code></em></p>
<p>It sets the current transformation matrix (CTM) apply to pixel data after the
lookup through the degamma LUT and before the lookup through the gamma LUT. The
data is interpreted as a <code class="language-plaintext highlighter-rouge">struct drm_color_ctm</code>. Setting NULL means bypass.</p>
<h3 id="drm-crtc-gamma-1d-lut--amd-crtc-gamma-tf">DRM CRTC Gamma 1D LUT + AMD CRTC Gamma TF</h3>
<p><em>Described by: <code class="language-plaintext highlighter-rouge">dc->caps.color.mpc.ogam_ram</code></em></p>
<p>After all that, you might still want to convert the content to wire encoding.
No worries, in addition to DRM CRTC 1D LUT, we’ve got a AMD CRTC gamma transfer
function (TF) to make it happen. Possible TF values are defined by <code class="language-plaintext highlighter-rouge">enum
amdgpu_transfer_function</code>.</p>
<p>Pre-defined transfer functions:</p>
<ul>
<li>there is no MPC Gamma ROM. Curves are calculated by AMD color modules.</li>
<li>supported curves: Identity, sRGB inverse EOTF, BT.709 OETF, PQ inverse EOTF,
HLG OETF, and Gamma 2.2, Gamma 2.4, Gamma 2.6 inverse EOTF.</li>
</ul>
<p>The 1D LUT currently accepts 4096 entries of 8-bit. The data is interpreted as an array
of <code class="language-plaintext highlighter-rouge">struct drm_color_lut</code> elements. When setting CRTC Gamma TF (!= Identity)
and LUT at the same time, the color module will combine the pre-defined TF and
the custom LUT values into the LUT that’s actually programmed. Setting TF =
Identity/Default and LUT to NULL means bypass.</p>
<p>References:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-13-mwen@igalia.com/">[PATCH v3 12/32] drm/amd/display: add CRTC gamma TF driver-specific property</a></li>
<li><a href="https://lore.kernel.org/amd-gfx/20230925194932.1329483-16-mwen@igalia.com/">[PATCH v3 15/32] drm/amd/display: add CRTC gamma TF support </a></li>
</ul>
<h3 id="others">Others</h3>
<h4 id="amd-crtc-shaper-and-3d-lut">AMD CRTC Shaper and 3D LUT</h4>
<p>We have previously worked on exposing <a href="https://lore.kernel.org/dri-devel/20230423141051.702990-27-mwen@igalia.com/">CRTC shaper</a>
and <a href="https://lore.kernel.org/dri-devel/20230423141051.702990-25-mwen@igalia.com/">CRTC 3D LUT</a>,
but they were removed from the AMD driver-specific color series because they
lack userspace case. CRTC shaper and 3D LUT works similar to plane shaper and
3D LUT but after blending (MPC block). The difference here is that setting (not
bypass) Shaper and Gamma blocks together are not expected, since both blocks
are used to delinearize the input space. In summary, we either set Shaper + 3D
LUT or Gamma.</p>
<h4 id="input-and-output-color-space-conversion">Input and Output Color Space Conversion</h4>
<p>There are two other color capabilities of AMD display hardware that were
integrated to DRM by previous works and worth a brief explanation here. The DC
Input CSC sets pre-defined coefficients from the values of DRM plane
<code class="language-plaintext highlighter-rouge">color_range</code> and <code class="language-plaintext highlighter-rouge">color_encoding</code> properties. It is used for color space
conversion of the input content. On the other hand, we have de DC Output CSC
(OCSC) sets pre-defined coefficients from DRM connector <code class="language-plaintext highlighter-rouge">colorspace</code>
properties. It is uses for color space conversion of the composed image to the
one supported by the sink.</p>
<p>References:</p>
<ul>
<li><a href="https://lore.kernel.org/amd-gfx/20220616012127.793375-1-joshua@froggi.es/">[PATCH] amd/display/dc: Fix COLOR_ENCODING and COLOR_RANGE doing nothing for DCN20+</a></li>
<li><a href="https://www.youtube.com/watch?v=Gg4eSAP1uc4&t=4010s">[PATCH v6 00/13] Enable Colorspace connector property in amdgpu</a></li>
</ul>
<h2 id="the-search-for-rainbow-treasures-is-not-over-yet">The search for rainbow treasures is not over yet</h2>
<p>If you want to understand a little more about this work, be sure to watch Joshua and I presented two talks at XDC 2023 about AMD/Steam Deck colors on Gamescope:</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=Gg4eSAP1uc4&t=2553s">The rainbow treasure map: advanced color management on Linux with AMD/Steam Deck</a></li>
<li><a href="https://indico.freedesktop.org/event/4/contributions/202/">Rainbow Frogs: HDR + Color Management in Gamescope/SteamOS</a></li>
</ul>
<p>In the time between the first and second part of this blog post,
<a href="https://lore.kernel.org/dri-devel/20230829160422.1251087-1-uma.shankar@intel.com/">Uma Shashank and Chaitanya Kumar Borah published the plane color pipeline for Intel</a>
and <a href="https://lore.kernel.org/dri-devel/20230908150235.75918-1-harry.wentland@amd.com/">Harry Wentland implemented a generic API for DRM based on VKMS support</a>.
We discussed these two proposals and the next steps for Color on Linux during <a href="https://indico.freedesktop.org/event/4/contributions/187/">the Color Management workshop at XDC 2023</a> and I briefly shared workshop results in the <a href="https://www.youtube.com/watch?v=Gg4eSAP1uc4&t=4010s">2023 XDC lightning talk session</a>.</p>
<p>The search for rainbow treasures is not over yet! We plan to meet again next year in the 2024 Display Hackfest in Coruña-Spain (Igalia’s HQ) to keep up the pace and continue advancing today’s display needs on Linux.</p>
<p>Finally, a HUGE thank you to everyone who worked with me on exploring AMD’s color capabilities and making them available in userspace.</p>2023-11-07T08:12:00+00:00Tomeu Vizoso: Etnaviv NPU update 10: Upstreaming and TP jobs update
https://blog.tomeuvizoso.net/2023/11/etnaviv-npu-update-10-upstreaming-and.html
<p> If you remember the <a href="https://blog.tomeuvizoso.net/2023/10/etnaviv-npu-update-9-we-got-there.html">last update</a> two weeks ago, I got MobileNetV1 working with good performance, and I was planning to move to upstreaming my changes to the Linux kernel and <a href="https://www.mesa3d.org/">Mesa</a>.</p><p>One of the kernel patches is now queued for the 6.7 release of the Linux kernel, and the other one has just been resent for reviews.</p><p>Regarding Mesa, I have made several cleanups and have started getting great <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25714">review comments</a> from <a href="https://github.com/austriancoder">Christian Gmeiner</a>.</p><p>While waiting for feedback, I have started work on using the TP cores for tensor manipulation, which should be many times faster than the naive code I was running on the CPU for this.</p><p>Got some jobs producing the correct results, but I'm facing a problem with the GPU hanging right afterwards. Have already made a pass at the whole set of data that is sent to the HW (unit configuration, command stream and registers), but haven't found yet the problem. I will next improve the tooling around this and get a better view of the differences.</p><p>I hacked Mesa to use the out-of-tree driver and my code works that way, so it has to be something at the kernel driver.</p><p>During the next weeks I will keep incorporating feedback and see how I can fix the GPU hang on TP jobs.<br /></p><p><br /></p>2023-11-06T09:30:00+00:00Dave Airlie (blogspot): nouveau GSP firmware support - current state
https://airlied.blogspot.com/2023/11/nouveau-gsp-firmware-support-current.html
<p>Linus has pulled the initial GSP firmware support for nouveau. This is just the first set of work to use the new GSP firmware and there are likely many challenges and improvements ahead.</p><p>To get this working you need to install the firmware which hasn't landed in linux-firmware yet.</p><p>For Fedora this copr has the firmware in the necessary places:<br /></p><p><a href="https://copr.fedorainfracloud.org/coprs/airlied/nouveau-gsp/build/6593115/">https://copr.fedorainfracloud.org/coprs/airlied/nouveau-gsp/build/6593115/ </a></p><p>Hopefully we can upstream that in next week or so.<br /></p><p>If you have an ADA based GPU then it should just try and work out of the box, if you have Turing or Ampere you currently need to pass nouveau.config=NvGspRm=1 on the kernel command line to attempt to use GSP.</p><p>Going forward, I've got a few fixes and stabilization bits to land, which we will concentrate on for 6.7, then going forward we have to work out how to keep it up to date and support new hardware and how to add new features.<br /></p><p><br /></p>2023-11-05T20:23:40+00:00Olivier Fourdan: Xwayland rootful - part 2
https://ofourdan.blogspot.com/2023/11/xwayland-rootful-part-2.html
<p>This is the second part of the Xwayland rootful post, <a href="https://ofourdan.blogspot.com/2023/10/xwayland-rootful-part1.html">the first part is there</a>. </p><h4 style="text-align: left;">Using Xwayland rootful to run a full X11 desktop</h4><p>Xwayland rootful can run more than just a window manager, it can as well run an entire X11 desktop, for example with Xfce:</p><div style="text-align: left;"><span style="background-color: #eeeeee; font-family: Source Code Pro;">$ Xwayland -geometry 1024x768 -decorate :12 &</span></div><div style="text-align: left;"><span style="background-color: #eeeeee;"><span style="font-family: Source Code Pro;">$ </span><span style="font-family: Source Code Pro;">DISPLAY=:12 SESSION_MANAGER= GDK_BACKEND=x11 dbus-run-session startxfce4</span></span></div><div class="separator" style="clear: both;"><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSb8E6lEYJ3ehhYeqSxquQYse3Y9daXVqkkBfzBm-9Wjv21aNbK0O6BRdur1inm1cLI8QdWXRGkYKE8Qn-__QJNoqmfscHVpQWKTium068IGXQMXpHaX_R2fdlamHdcE_4F_LJ-Jc5sDZLuA0gmBLOAGonLYMcf3JsYsZX1YWtT3jlAfHVAO3nX-KeFew/s1918/Screenshot%20from%202023-10-20%2012-25-55.png" style="margin-left: auto; margin-right: auto;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSb8E6lEYJ3ehhYeqSxquQYse3Y9daXVqkkBfzBm-9Wjv21aNbK0O6BRdur1inm1cLI8QdWXRGkYKE8Qn-__QJNoqmfscHVpQWKTium068IGXQMXpHaX_R2fdlamHdcE_4F_LJ-Jc5sDZLuA0gmBLOAGonLYMcf3JsYsZX1YWtT3jlAfHVAO3nX-KeFew/w640-h360/Screenshot%20from%202023-10-20%2012-25-55.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Xfce running on Xwayland rootful in GNOME Shell on Wayland</td></tr></tbody></table><br /><div class="separator" style="clear: both;"><br /></div><div class="separator" style="clear: both;">Unfortunately, not all the keyboard shortcuts within the nested X11 session actually work, because some of those (such a Alt-Tab for example) get processed by the Wayland compositor directly, instead of being forwarded to the nested environment.</div><p style="text-align: justify;">This however isn't a problem specific to Wayland or Xwayland, an X11 window manager running in <span style="background-color: #eeeeee; font-family: Source Code Pro;">Xnest</span> or <span style="background-color: #eeeeee;"><span style="font-family: Source Code Pro;">Xephyr</span></span> will have the same issues with keyboard shortcuts. To avoid that, <span style="background-color: #eeeeee; font-family: Source Code Pro;">Xephyr</span> is able to „grab“ the keyboard and pointer so that all input events end up in the nested X11 session and do not get processed by the parent session.<br /></p><p></p><p style="text-align: justify;">Xwayland 23.1 has a similar functionality using the Wayland pointer locking & confinement protocol and the keyboard shortcuts inhibitor protocol.</p><p style="text-align: justify;">So if your favorite Wayland compositor supports these protocols (in doubt, you can check that it is the case using „<span style="background-color: #eeeeee; font-family: Source Code Pro;">wayland-info</span>“), you can use the „<span style="background-color: #eeeeee; font-family: Source Code Pro;">-host-grab</span>“ option in Xwayland rootful:</p><div style="text-align: left;"><span style="background-color: #eeeeee; font-family: Source Code Pro;">$ Xwayland -geometry 1024x768 -decorate -host-grab :12 &</span></div><div style="text-align: left;"><span style="background-color: #eeeeee; font-family: Source Code Pro;"><span style="font-family: Source Code Pro;">$ </span><span style="font-family: Source Code Pro;">DISPLAY=:12 SESSION_MANAGER= GDK_BACKEND=x11 dbus-run-session startxfce4</span></span></div><div><p style="clear: both; text-align: center;"></p><p style="clear: both; text-align: justify;"><span style="text-align: left;">Pressing the Control and Shift keys simultaneously will release the keyboard and pointer (just like with </span><span style="background-color: #eeeeee; text-align: left;"><span style="font-family: Source Code Pro;">Xephyr</span></span><span style="text-align: left;"> actually).</span></p><h4>Using Xwayland rootful to run a single X11 application</h4><p style="text-align: left;">In some cases, it might be desirable to run a single X11 application isolated from the rest of the X11 clients, on its own X11 server.</p><p style="text-align: left;">On such a setup, one could run a single X11 client either maximized or fullscreen within Xwayland rootful.</p><p style="text-align: left;">Since Xwayland 23.2 allows to interactively resize the root window, users could mode and resize that window at will.</p><p style="text-align: left;">But for that to work, we need a simple X11 window manager that could resize the X11 client window along with the root window, using XRANDR notifications, such as the <a href="https://www.yoctoproject.org/software-item/matchbox/" target="_blank">matchbox</a> window manager for example.</p><div><span style="background-color: #eeeeee; font-family: Source Code Pro;">$ Xwayland -geometry 1024x768 -decorate :12 &</span></div><div><span style="background-color: #eeeeee;"><span style="font-family: Source Code Pro;">$ </span><span style="font-family: Source Code Pro;">matchbox-window-manager -display :12 &</span></span></div><div><span style="background-color: #eeeeee; font-family: Source Code Pro;">$ GDK_BACKEND=x11 midori --display=:12</span></div><div class="separator" style="clear: both;"><br /></div></div><p style="clear: both; text-align: center;"></p><p style="clear: both; text-align: justify;"><span style="text-align: left;">When the Xwayland rootful window is resized, corresponding XRANDR events are emitted, notifying the X11 window manager which in turn resizes the client window.</span></p><div class="separator" style="clear: both;"><h4>Using Xwayland rootful fullscreen</h4><p style="text-align: left;">For years now, Xwayland rootless had support for the <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/tree/main/stable/viewporter" target="_blank">viewport Wayland protocol</a>, to <a href="https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/321" target="_blank">emulate XRandR for legacy games</a> thanks to the work from <a href="https://hansdegoede.livejournal.com/" target="_blank">Hans De Goede</a>.</p><p style="text-align: left;">So the idea is to add a fullscreen mode to Xwayland rootful and take advantage of the Wayland viewports support to emulate resolution changes.</p><p style="text-align: left;">This is exactly what the „<span style="background-color: #eeeeee; font-family: Source Code Pro;">-fullscreen</span>“ command line options does, it starts Xwayland rootful in fullscreen mode using the <a href="https://gitlab.freedesktop.org/wayland/wayland-protocols/-/blob/main/stable/xdg-shell/xdg-shell.xml" target="_blank">xdg_toplevel</a> Wayland protocol and uses the existing viewport support to scale the window and to match the actual display physical resolution.</p><p style="text-align: left;">The emulated resolution is not even limited by the physical resolution, it's possible to use XRANDR to select an emulated resolution much higher than the actual monitor's resolution, quite handy to test X11 applications on high resolution without having to purchase expensive monitors!</p><div><span style="background-color: #eeeeee; font-family: Source Code Pro;">$ Xwayland -fullscreen :12 &</span></div><div><span style="background-color: #eeeeee;"><span style="font-family: Source Code Pro;">$ </span><span style="font-family: Source Code Pro;">matchbox-window-manager -display :12 &</span></span></div><div><span style="background-color: #eeeeee; font-family: Source Code Pro;">$ xterm -display :12 &</span></div><div><span style="background-color: #eeeeee; font-family: Source Code Pro;">$ xrandr -s 5120x2880 -display :12</span></div><p style="clear: both; text-align: center;"></p><h4 style="clear: both; text-align: justify;">Are we done yet?</h4><p style="clear: both; text-align: justify;">Well, there's still one thing Xwayland is not handling well, it's HiDPI and fractional scaling.</p><p style="clear: both; text-align: justify;">With rootless Xwayland (as on a typical Wayland desktop session), all X11 clients share the same Xwayland server, and can span across different Wayland outputs of different scales.</p><p style="clear: both; text-align: justify;">Even though theoretically each Wayland surface associated with each X11 window could have a different scale factor set by Xwayland, all X11 clients on the same Xserver share the same coordinate space, so in practice different X11 windows cannot have different scale factors applied.</p><p style="clear: both; text-align: justify;">That's the reason why all the existing merge requests to add support for HiDPI to Xwayland set the same scale to all X11 surfaces. But that means that the rendered surface could end up being way too small depending on the actual scale the window is placed on, on a mixed-DPI multi-monitor setup (I already shared my views of the problem in <a href="https://gitlab.freedesktop.org/xorg/xserver/-/issues/1318" target="_blank">this issue upstream</a>).</p><p style="clear: both; text-align: justify;">But such limitation does not apply to rootful Xwayland, considering that all the X11 clients running on a rootful Xwayland actually belong to and remain within the same visible root window. They are part of the same visual entity and move all together along with the Xwayland rootful window.</p><p style="clear: both; text-align: justify;">So we could possibly add support for HiDPI (and hence achieve fractional scaling without blurred fonts) to rootful Xwayland. The idea is that Xwayland would set the surface scale to match the scale of the output it's placed on, and automatically resize its root window according to the scale, whenever that changes or when the rootful Xwayland window is moved from one monitor to another.</p><p style="clear: both; text-align: justify;">So for example, when Xwayland rootful with a size of 640×480 is moved from an output with scale 1 to an output with scale 2, the size of the root window (hence the Xwayland rootful window) would be automatically changed to 1280×960, along with the corresponding <span style="text-align: left;">XRANDR notifications so that an X11 window manager running nested can adjust the X11 clients size and positions.</span></p><p style="clear: both; text-align: justify;">And if we want a way to communicate that to the X11 clients running within Xwayland rootful, we can use an X11 property on the root window that reflects the actual scale factor being applied. An X11 client could either use that property directly, or more likely, a simple dedicated daemon could adjust the scaling factor of the various X11 toolkits depending on the value set for Wayland scaling.</p><p style="clear: both; text-align: justify;">That's what that proposed <a href="https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/1197" target="_blank">merge request upstream</a> does.</p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWpIpawTQ4gPpkXdPV9YGO65d61oXqHZ8sJehq2KQneoTsRrm4m7Dw1KAV_8iiAa3F6A_khPrrEbq0xGcpu-Q0i1CI6ewXKZDv0Uq67V20uGMnU2EzIhgYhhoSO4aY0RpKmvS0K8QOgZ2qyz9bycCaktOi5WVR8MUQ-pav3cjIxNhWkyRXLaI8dZjZ44k/s1920/Screenshot%20from%202023-11-03%2013-44-25.png" style="margin-left: auto; margin-right: auto;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWpIpawTQ4gPpkXdPV9YGO65d61oXqHZ8sJehq2KQneoTsRrm4m7Dw1KAV_8iiAa3F6A_khPrrEbq0xGcpu-Q0i1CI6ewXKZDv0Uq67V20uGMnU2EzIhgYhhoSO4aY0RpKmvS0K8QOgZ2qyz9bycCaktOi5WVR8MUQ-pav3cjIxNhWkyRXLaI8dZjZ44k/w400-h225/Screenshot%20from%202023-11-03%2013-44-25.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">gnome-calculator running on Xwayland rootful with 150% fractional scaling</td></tr></tbody></table><br /><p style="clear: both; text-align: justify;">Of course, at this time of writing, this is just a merge request I just posted upstream, and there is no promise that it will accepted eventually. We'll see how that goes, but if that could find its way to Xwayland upstream, it would be part of the next major release of Xwayland some time next year.</p></div>2023-11-03T14:22:43+00:00Iago Toral: XDC 2023
https://blogs.igalia.com/itoral/2023/10/30/xdc-2023/
<p>I was at XDC 2023 in A Coruña a few days ago where I had the opportunity to talk about some of the work we have been doing on the Raspberry Pi driver stack together with my colleagues Juan Suárez and Maíra Canal. We talked about Raspberry Pi 5, CPU job handling in the Vulkan driver, OpenGL 3.1 support and how we are exposing GPU stats to user space. If you missed it here is the link to <a href="https://www.youtube.com/watch?v=AnqUMhF7_xQ&t=11805s">Youtube</a>.</p>
<p>Big thanks to Igalia for organizing it and to all the sponsors and specially to Samuel and Chema for all the work they put into making this happen.</p>2023-10-30T11:15:52+00:00Mike Blumenkrantz: 2024
https://www.supergoodcode.com/2024/
<p>🪑?</p>2023-10-27T00:00:00+00:00Mike Blumenkrantz: Readback
https://www.supergoodcode.com/readback/
<h1 id="and-now-for-something-slightly-more-technical">And Now For Something Slightly More Technical</h1>
<p>It’s a busy, busy week here. So busy I’m slipping on my blogging. But that’s okay, because here one last big technical post about something I hate.</p>
<p>Swapchain readback.</p>
<h1 id="so-easy-even-you-could-accidentally-do-it">So Easy Even You Could Accidentally Do It</h1>
<p>I’m not alone in drinking the haterade on this one, but GL makes it especially easy to footgun yourself by not providing explicit feedback that you’re footgunning yourself.</p>
<p>I recently encountered a scenario in <strong>REDACTED</strong> where this behavior was commonplace. The command stream looked roughly like this:</p>
<ul>
<li>draw some stuff</li>
<li>swapbuffers</li>
<li>blitframebuffer</li>
</ul>
<p>And this happened on every single frame (???).</p>
<h1 id="in-zink-terms">In Zink Terms…</h1>
<p>This isn’t pretty. Zink has an extremely conformant method of performing swapchain readback which definitely works without issues in all cases. I’d explain it, but it wouldn’t make either of us happy, and I’ve got so much other stuff to do that I couldn’t possibly… Oh, you really want to know? Well don’t say I didn’t warn you.</p>
<p>Vulkan doesn’t allow readback from swapchains. By this, I mean:</p>
<ul>
<li>swapchain images must be acquired before they can be accessed for any purpose</li>
<li>there is no method to explicitly reacquire a specific swapchain image</li>
<li>there is no guarantee that swapchain images are unchanged after present</li>
</ul>
<p>Combined, once you have presented a swapchain image you’re screwed.</p>
<p>…According to the spec, that is. In the real world, things work differently.</p>
<p>Zink takes advantage of this “real world” utilization to implement swapchain readback. In short, the only method available is to spam present/acquire on the swapchain until the last-presented image is reacquired. Then it can be read back, and the image data is (probably) the same as when it was presented.</p>
<h1 id="p-e-r-f"><del>P E R F</del></h1>
<p>This is not a speedy method of implementing readback. It requires a full sync, and it was designed for the purpose of passing unit tests, which is does perfectly. Performance was never a concern, because why would anyone ever be trying to do readback in… Why would anyone ever be trying to do readback in a performance-sensitive… Using OpenGL, why would anyone ever be…</p>
<p>Anyway, this is very unperformant, and here at SGC we hate all things of that nature. Given that I had my real world scenario from <strong>REDACTED</strong> in which this was happening every frame, something had to be done.</p>
<p>This solution isn’t performant in the absolute sense either, but it’s massively faster than what was happening previously. Once zink detects an app repeatedly footgunning itself at full speed, it activates readback mode for a swapchain and maintains a staging copy of every frame. This enables the image data to be read back at any time without synchronization at the cost of an extra full-frame copy. This roughly doubles FPS in the case I was testing, which is pretty good.</p>
<p>The functionality is <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25754">already merged</a> for the upcoming 23.3 release.</p>
<p>Footgun as hard as you want.</p>2023-10-26T00:00:00+00:00Mike Blumenkrantz: Crabformance
https://www.supergoodcode.com/crabformance/
<h1 id="more-milestones">More Milestones</h1>
<p>As everyone knows, Red Hat’s top RustiCL expert, Karol “But it’s only 10 o’clock?” Herbst, has been hard at work beating Mesa/Zink/RustiCL into shape. That effort continues to <a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25837">bear fruit</a>, and with the merge of an upcoming MR it should be possible to pass OpenCL conformance with zink on multiple platforms.</p>
<p>This will make zink <strong>THE FIRST EVER CONFORMANT VULKAN-BASED OPENCL IMPLEMENTATION</strong>.</p>
<p>Great work all around. For up-to-the-second progress reports on this ecosystem-critical topic, don’t forget to follow Karol on <a href="https://chaos.social/@karolherbst">social media</a>.</p>2023-10-25T00:00:00+00:00Simon Ser: Status update, October 2023
https://emersion.fr/blog/2023/status-update-57/
<p>Hi all, long time no see! It’s been more than two months since the last status
update. My excuse for this silence is two-fold: I was on leave for 5 weeks,
and then <a href="https://indico.freedesktop.org/event/4/">X.Org Developer’s Conference</a> happened. During my time off,
I’ve traveled in Korea and Japan. I will be blunt: these last two months have
been fantastic! And to be honest, that’s a huge understatement.</p>
<p><img alt="Busan view from Jangsan" src="https://pxscdn.com/public/m/_v2/1521/f1538e3aa-7b3151/szQZbtIs7rgT/mULRQYscLuW86sjRI6dHuMM89lLgaVUchxrPeGnn.jpg" /></p>
<p><img alt="East gate" src="https://pxscdn.com/public/m/_v2/1521/f1538e3aa-7b3151/EeHyXzOn2U1p/Kngw1IvQ8mMvnpFAGfMDGWDUkAzbbYIaAYB9lvnU.jpg" /></p>
<p>After my trip in Asia, I went to a 2-day Valve hackfest in Igalia’s
headquarters. I met other Valve contractors there, we discussed about various
topics such as color management, variable refresh rate, flicker-free startup,
and more.</p>
<p>At XDC, there were lots of interesting talks and workshops: HDR by Joshua and
Melissa, NVK by Faith, Asahi by Alyssa et al, wlroots frame scheduling by Rose
(my GSoC student), CI by Martin, VKMS by Maíra, Wine Wayland by Alexandros,
Wine X11 by Arek, and many more! Everything should be available online if you
haven’t watched live. That said, as usual, the part I enjoyed the most is the
so-called hallway track. It’s great to have free-form discussions with fellow
graphics developers, it results in a pretty different train of thought than the
usual focused discussions we have online.</p>
<p>Apart from these events, I’ve found some time to do a bit of actual work, too.
I’ve re-spinned an old patch I wrote to introduce a new <a href="https://lore.kernel.org/dri-devel/20231020101926.145327-2-contact@emersion.fr/">CLOSEFB</a> IOCTL, to
allow a DRM master to leave a framebuffer on-screen when quitting so that the
next DRM master can take over without a black screen in-between. This time I
also included a user-space patch and an IGT test (both requirements for new
kernel uAPI). I sent (and merged) <a href="https://lore.kernel.org/dri-devel/20231005131623.114379-1-contact@emersion.fr/">another kernel patch</a>
to fix black screens in some situations when unplugging USB-C docks.</p>
<p>On the Wayland side, I continued working on explicit synchronization, updating
the protocol and submitting a <a href="https://github.com/ValveSoftware/gamescope/pull/982">gamescope patch</a>. Joshua has been working on a
<a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25709">Mesa patch</a>, so all of the pieces are coming together now. On the SourceHut
side, I’ve sent a patch to add HTTP/2 support to <a href="https://pages.sr.ht">pages.sr.ht</a>. It’s been
merged and deployed, enjoy! The
<abbr title="New Project of the Two Months">NPotTM</abbr> is <a href="https://gitlab.freedesktop.org/emersion/libicc">libicc</a>, a small
library to parse ICC profile files. Unlike LittleCMS, it provides lower-level
access to the ICC structure and the exact color transformation operations.</p>
<p>That’s all for now, see you next month!</p>2023-10-24T22:00:00+00:00Hans de Goede: Fix Fedora IPU6 not working on Dell laptops with kernel 6.5
https://hansdegoede.livejournal.com/27809.html
There is an issue with the rpmfusion packaged IPU6 camera stack for Fedora is not working on many Dell laptop models after upgrading the kernel to a 6.5.y kernel.<br /><br />This is caused by a new mainline ov0a10 sensor driver which takes precedence over the akmod ov0a10 driver but lacks VSC integration.<br /><br />This can be worked around by running the following command:<br /><pre class="" wrap="">sudo rm <i class=""><span class="">/</span>lib/modules<span class="">/</span></i>$(uname -r)/kernel/drivers/media/i2c/ov01a10.ko.xz; sudo depmod -a</pre><br />After the rm + depmod run:<br /><pre class="" wrap="">sudo rmmod ov01a10; sudo modprobe ov01a10</pre><br />Or reboot. After this your camera will hopefully work again.<br /><br />I have submitted a <a href="https://gitlab.com/cki-project/kernel-ark/-/merge_requests/2762" rel="nofollow" target="_blank">pull-request</a> to disable the mainline kernel's non working ov01a10 driver, so after the next Fedora kernel update this workaround should no longer be necessary.2023-10-24T12:51:40+00:00