planet.freedesktop.org
April 17, 2014
Under that name is a simple idea: making it easier to save, load, update and query objects in an object store.

I'm not the main developer for this piece of code, but contributed a large number of fixes to it, while porting a piece of code to it as a test of the API. Much of the credit for the design of this very useful library goes to Christian Hergert.

The problem

It's possible that you've already implemented a data store inside your application, hiding your complicated SQL queries in a separate file because they contain injection security issues. Or you've used the filesystem as the store and threw away the ability to search particular fields without loading everything in memory first.

Given that SQLite pretty much matches our use case - it offers good search performance, it's a popular thus well-documented project and its files can be manipulated through a number of first-party and third-party tools - wrapping its API to make it easier to use is probably the right solution.

The GOM solution

GOM is a GObject based wrapper around SQLite. It will hide SQL from you, but still allow you to call to it if you have a specific query you want to run. It will also make sure that SQLite queries don't block your main thread, which is pretty useful indeed for UI applications.

For each table, you would have a GObject, a subclass of GomResource, representing a row in that table. Each column is a property on the object. To add a new item to the table, you would simply do:

item = g_object_new (ITEM_TYPE_RESOURCE,
"column1", value1,
"column2", value2, NULL);
gom_resource_save_sync (item, NULL);

We have a number of features which try to make it as easy as possible for application developers to use gom, such as:
  • Automatic table creation for string, string arrays, and number types as well as GDateTime, and transformation support for complex types (say, colours or images).
  • Automatic database version migration, using annotations on the properties ("new in version")
  • Programmatic API for queries, including deferred fetches for results
Currently, the main net gain in terms of lines of code, when porting SQLite, is the verbosity of declaring properties with GObject. That will hopefully be fixed by the GProperty work planned for the next GLib release.

The future

I'm currently working on some missing features to support a port of the grilo bookmarks plugin (support for column REFERENCES).

I will also be making (small) changes to the API to allow changing the backend from SQLite to a another one, such as XML, or a binary format. Obviously the SQL "escape hatches" wouldn't be available with those backends.

Don't hesitate to file bugs if there are any problems with the API, or its documentation, especially with respect to porting from applications already using SQLite directly. Or if there are bugs (surely, no).

Note that JavaScript support isn't ready yet, due to limitations in gjs.

¹: « SQLite don't hurt me, don't hurt me, no more »
April 16, 2014

Things are moving forward for the Fedora Workstation project. For those of you who don’t know about it, it is part of a broader plan to refocus Fedora around 3 core products with clear and distinctive usecase for each. The goal here is to be able to have a clear definition of what Fedora is and have something that for instance ISVs can clearly identify and target with their products. At the same time it is trying to move away from the traditional distribution model, a model where you primarily take whatever comes your way from upstream, apply a little duct tape to try to keep things together and ship it. That model was good in the early years of Linux existence, but it does not seem a great fit for what people want from an operating system today.

If we look at successful products MacOS X, Playstation 4, Android and ChromeOS the red thread between them is that while they all was built on top of existing open source efforts, they didn’t just indiscriminately shovel in any open source code and project they could find, instead they decided upon the product they wanted to make and then cherry picked the pieces out there that could help them with that, developing what they couldn’t find perfect fits for themselves. The same is to some degree true for things like Red Hat Enterprise Linux and Ubuntu. Both products, while based almost solely on existing open source components, have cherry picked what they wanted and then developed what pieces they needed on top of them. For instance for Red Hat Enterprise Linux its custom kernel has always been part of the value add offered, a linux kernel with a core set of dependable APIs.

Fedora on the other hand has historically followed a path more akin to Debian with a ‘more the merrier’ attitude, trying to welcome anything into the group. A metaphor often used in the Fedora community to describe this state was that Fedora was like a collection of Lego blocks. So if you had the time and the interest you could build almost anything with it. The problem with this state was that the products you built also ended up feeling like the creations you make with a random box of lego blocks. A lot of pointy edges and some weird looking sections due to needing to solve some of the issues with the pieces you had available as opposed to the piece most suited.

With the 3 products we are switching to a model where although we start with that big box of lego blocks we add some engineering capacity on top of it, make some clear and hard decisions on direction, and actually start creating something that looks and feels like it was made to be a whole instead of just assembled from a random set of pieces. So when we are planning the Fedora Workstation we are not just looking at what features we can develop for individual libraries or applications like GTK+, Firefox or LibreOffice, but we are looking at what we want the system as a whole to look like. And maybe most important we try our hardest to look at things from a feature/usecase viewpoint first as opposed to a specific technology viewpoint. So instead of asking ‘what features are there in systemd that we can expose/use in the desktop being our question, the question instead becomes ‘what new features do we want to offer our users in future versions of the product, and what do we need from systemd, the kernel and others to be able to do that’.

So while technologies such as systemd, Wayland, docker, btrfs are on our roadmap, they are not there because they are ‘cool technologies’, they are there because they provide us with the infrastructure we need to achieve our feature goals. And whats more we make sure to work closely with the core developers to make the technologies what we need them to be. This means for example that between myself and other members of the team we are having regular conversations with people such as Kristian Høgsberg and Lennart Poettering, and of course contributing code where possible.

To explain our mindset with the Fedora Workstation effort let me quickly summarize some old history. In 2001 Jim Gettys, one of the original creators of the X Window System did at talk a GUADEC in Sevile called ‘Draining the Swamp’. I don’t think the talk can be found online anywhere, but he outlined some of the same thoughts in this email reply to Richard Stallman some time later. I think that presentation has shaped the thinking of the people who saw it ever since, I know it has shaped mine. Jim’s core message was that the idea that we can create a great desktop system by trying to work around the shortcomings or weirdness in the rest of the operating system was a total fallacy. If we look at the operating system as a collection of 100% independent parts, all developing at their own pace and with their own agendas, we will never be able to create a truly great user experience on the desktop. Instead we need to work across the stack, fixing the issues we see where they should be fixed, and through that ‘drain the swamp’. Because if we continued to try to solve the problems by adding layers upon layers of workarounds and abstraction layers we would instead be growing the swamp, making it even more unmanageable. We are trying to bring that ‘draining the swamp’ mindset with us into creating the Fedora Workstation product.

With that in mind what is the driving ideas behind the Fedora Workstation? The Fedora Workstation effort is meant to provide a first class desktop for your laptop or workstation computer, combining a polished user interface with access to new technologies. We are putting a special emphasis on developers with our first releases, both looking at how we improve the desktop experience for developers, and looking at what tools we can offer to developers to let them be productive as quickly as possible. And to be clear when we say developers we are not only thinking about developers who wants to develop for the desktop or the desktop itself, but any kind of software developer or DevOPs out there.

The full description of the Fedora Workstation can be found here, but the essence of our plan is to create a desktop system that not only provides some incremental improvements over how things are done today, but which tries truly take a fresh look at how a linux desktop operating system should operate. The traditional distribution model, built up around software packages like RPM or Deb has both its pluses and minuses.
Its biggest challenge is probably that it creates a series of fiefdoms where a 3rd party developers can’t easily target the system or a family of systems except through spending time very specifically supporting each one. And even once a developers decides to commit to trying to support a given system it is not clear what system services they can depend on always being available or what human interface design they should aim for. Solving these kind of issues is part of our agenda for the new workstation.

So to achieve this we have decided on a set of core technologies to build this solution upon. The central piece of the puzzle is the so called LinuxApps proposal from Lennart Poettering. LinuxApps is currently a combination of high level ideas and some concrete building blocks. In terms of the building blocks are technologies such as Wayland, kdbus, overlayfs and software containers. The ideas side include developing a permission system similar to what you for instance see Android applications employ to decide what rights a given application has and develop defined versioned library bundles that 3rd party applications can depend on regardless of the version of the operating system. On the container side we plan on expanding on the work Red Hat is doing with Docker and Project Atomic.

In terms of some of the other building blocks I think most of you already know of the big push we are doing to get the new Wayland display server ready. This includes work on developing core infrastructure like libinput, a new library for handling input devices being developed by Jonas Ådahl and our own Peter Hutterer. There is also a lot of work happening on the GNOME 3 side of things to make GNOME 3 Wayland ready. Jasper St.Pierre wrote up a great blog blog entry outlining his work to make GDM and the GNOME Shell work better with Wayland. It is an ongoing effort, but there is a big community around this effort as most recently seen at the West Cost Hackfest at the Endless Mobile office.

As I mentioned there is a special emphasis on developers for the initial releases. These includes both a set of small and big changes. For instance we decided to put some time into improving the GNOME terminal application as we know it is a crucial piece of technology for a lot of developers and system administers alike. Some of the terminal improvements can be seen in GNOME 3.12, but we have more features lined up for the terminal, including the return of translucency. But we are also looking at the tools provided in general and the great thing here is that we are able to build upon a lot of efforts that Red Hat is developing for the Red Hat product portfolio, like Software Collections which gives easy access to a wide range of development tools and environments. Together with Developers Assistant this should greatly enhance your developers experience in the Fedora Workstation. The inclusion of Software collections also means that Fedora becomes an even better tool than before for developing software that you expect to deploy on RHEL, as you can be sure that an identical software collection will be available on RHEL that you have been developing against on Fedora as software collections ensure that you can have the exact same toolchain and toolchain versions available for both systems.

Of course creating a great operating system isn’t just about the applications and shell, but also about supporting the kind of hardware people want to use. A good example here is that we put a lot of effort into HiDPI support. HiDPI screens are not very common yet, but a lot of the new high end laptops coming out are using them already. Anyone who has used something like a Google Pixel or a Samsung Ativ Book 9 Plus has quickly come to appreciate the improved sharpness and image quality these displays brings. Due to the effort we put in there I have been very pleased to see many GNOME 3.12 reviews mentioning this work recently and saying that GNOME 3.12 is currently the best linux desktop for use with HiDPI systems due to it.

Another part of the puzzle for creating a better operating system is the software installation. The traditional distribution model often tended to try to bundle as many applications as possible as there was no good way for users to discover new software for their system. This is a brute force approach that assumes that if you checked the ‘scientific researcher’ checkbox you want to install a random collection of 100 applications useful for ‘scientific researchers’. To me this is a symptom of a system that does not provide a good way of finding and installing new applications. Thanks to the ardent efforts of Richard Hughes we have a new Software Installer that keeps going from strength to strength. It was originally launched in Fedora 19, but as we move forward towards the first Fedora Workstation release we are enabling new features and adding polish to it. One area where we need to wider Fedora community to work with us is to increase the coverage of appdata files. Appdata files essentially contains the necessary metadata for the installer to describe and advertise the application in question, including descriptive text and screenshots. Ideally upstreams should come with their own appdata file, but in the case where they are not we should add them to the Fedora package directly. Currently applications from the GTK+ and GNOME sphere has relatively decent appdata coverage, but we need more effort into getting applications using other toolkits covered too.

Which brings me to another item of importance to the workstation. The linux community has for natural reasons been very technical in nature which has meant that some things that on other operating systems are not even a question has become defining traits on Linux. The choice of GUI development toolkits being one of these. It has been a great tool used by the open source community to shoot ourselves in the foot for many years now. So while users of Windows or MacOS X probably never ask themselves what toolkit was used to implement a given application, it seems to be a frequently asked one for linux applications. So we want to move away from it with the Workstation. So while we do ship the GNOME Shell as our interface and use GTK+ for developing tools ourselves, including spending time evolving the toolkit itself that does not mean we think applications written using for instance Qt, EFL or Java are evil and should be exorcised from the system. In fact if an application developer want to write an application for the linux desktop at all we greatly appreciate that effort regardless of what tools they decide to use to do so. The choice of development toolkits is a choice meant to empower developers, not create meaningless distinctions for the end user. So one effort we have underway is to work on the necessary theming and other glue code to make sure that if you run a Qt application under the GNOME Shell it feels like it belongs there, which also extends to if you are using accessibility related setups like the high contrast theme. We hope to expand upon that effort both in width and in depth going forward.

And maybe on a somewhat related note we are also trying to address the elephant in the room when it comes to the desktop and that is the fact that the importance of the traditional desktop is decreasing in favor of the web. A lot of things that you used to do locally on your computer you are probably just doing online these days. And a lot of the new things you have started doing on your computer or other internet capable device are actually web services as opposed to a local applications. The old Sun slogan of ‘The Network is the Computer’ is more true today than it has ever been before. So we don’t believe the desktop is dead in any way or form, as some of the hipsters in the media like to claim, in fact we expect it to stay around for a long time. What we do envision though is that the amount of time you spend on webapps will continue to grow and that more and more of your computing tasks will be done using web services as opposed to local applications. Which is why we are continuing to deeply integrate the web into your desktop. Be that through things like GNOME Online accounts or the new Webapps that are introduced in Software installer. And as I have mentioned before on this blog we are also still working on trying to improve the integration of Chrome and Firefox apps into the desktop along the same lines. So while we want the desktop to help you use the applications you want to run locally as efficiently as possible, we also realize that you like us are living in a connected world and thus we need to help give you get easy access to your online life to stay relevant.

So there are of course a lot of other parts of the Fedora Workstation effort, but this has already turned into a very long blog post as it is so I leave the rest for later. Please feel free to post any questions or comments and I will try to respond.

April 14, 2014
The 2014 "Journées du Logiciel Libre" took place in Lyon like (almost) every year this past week-end. It's a francophone free software event over 2 days with talks, and plenty of exhibitors from local Free Software organisations. I made the 600 metres trip to the venue, and helped man the GNOME booth with Frédéric Peters and Alexandre Franke's moustache.



Our demo computer was running GNOME 3.12, using Fedora 20 plus the GNOME 3.12 COPR repository which was working pretty well, bar some teething problems.

We kept the great GNOME 3.12 video running in Videos, showcasing the video websites integration, and regularly demo'd new applications to passers-by.

The majority of people we talked to were pretty impressed by the path GNOME has taken since GNOME 3.0 was released: the common design patterns across applications, the iterative nature of the various UI elements, the hardware integration or even the online services integration.

The stand-out changes for users were the Maps application which, though a bit bare bones still, impressed users, and the redesigned Videos.

We also spent time with a couple of users dispelling myths about "lightness" of certain desktop environments or the "heaviness" of GNOME. We're constantly working on reducing resource usage in GNOME, be it sluggishness due to the way certain components work (with the applications binary cache), memory usage (cf. the recent gjs improvements), or battery usage (cf. my wake-up reduction posts). The use of gnome-shell using tablet-grade hardware for desktop machines shows that we can offer a good user experience on hardware that's not top-of-the-line.

Our booth was opposite the ones from our good friends from Ubuntu and Fedora, and we routinely pointed to either of those booths for people that were interested in running the latest GNOME 3.12, whether using the Fedora COPR repository or Ubuntu GNOME.

We found a couple of bugs during demos, and promptly filed them in Bugzilla, or fixed them directly. In the future, we might want to run a stable branch version of GNOME Continuous to get fixes for embarrassing bugs quickly (such as a crash when enabling Zoom in gnome-shell which made an accessibility enthusiast tut at us).


GNOME and Rhône

Until next year in sunny Lyon.

(and thanks Alexandre for the photos in this article!)
April 06, 2014

When I started to write the landing page for The Hacker's Guide to Python, I wanted to try new things at the same time. I read about A/B testing a while ago, and I figured it was a good opportunity to test it out.

A/B testing

If you do not know what A/B testing is about, take a quick look at the Wikipedia page on that subject. Long story short, the idea is to serve two different version of a page to your visitors and check which one is getting the most success. When you found which version is better, you can definitely switch to it.

In the case of my book, I used that technique on the pre-launch page where people were able to subscribe to the newsletter. I didn't have a lot of things I wanted to test out on that page, so I just used that approach on the subtitle, being either "Learn everything you need to build a successful Python project" or "It's time you make the most out of Python".

Statistically, each version would be served half of the time, so both would get the same number of view. I then would build statistics about which page was attracting the most subscribers. With the results I would be able to switch definitively to that version of the landing page.

Technical design

My Web site, this Web site, is entirely static and served by Apache httpd. I didn't want to use any dynamic page, language or whatever. Mainly because I didn't want to have something else to install and maintain just for that on my server.

It turns out that Apache httpd is powerful enough to implement such a feature. There are different ways to build it, and I'm going to describe my choices here.

The first thing to pick is a way to balance the display of the page. You need to find a way so that if you get 100 visitors, around 50 will see the version A of your page, and around 50 will see the version B of the page.

You could use a random number generator, pick a random number for each visitor, and decides which page he's going to see. But it turns out that I didn't find a way to do that with Apache httpd at first sight.

My second thought was to pick the client IP address. But it's not such a good idea, because if you got visitors from, for example, people behind a company firewall, they are all going to be served the same page, so that kind of kills the statistics.

Finally, I picked time based balancing: if you visit the page on a second that is even, you get version A of the page, and if you visit the page on a second that is odd, you get version B. Simple, and so far nothing proves there are more visitors on even than odd seconds, or vice-versa.

The next thing is to always serve the same page to a returning visitor. I mean that if the visitor comes back later and get a different version, that's cheating. I decided the system should always serve the same page once a visitor "picked" a version. To do that, it's simple enough, you just have to use cookies to store the page the visitor has been attributed, and then use that cookie if he comes back.

Implementation

To do that in Apache httpd, I used the powerful mod_rewrite that is shipped with it. I put 2 files in the books directory, named either "the-hacker-guide-to-python-a.html" and "the-hacker-guide-to-python-b.html" that got served when you requested "/books/the-hacker-guide-to-python".

RewriteEngine On
RewriteBase /books
 
# If there's a cookie called thgtp-pre-version set,
# use its value and serve the page
RewriteCond %{HTTP_COOKIE} thgtp-pre-version=([^;])
RewriteRule ^the-hacker-guide-to-python$ %{REQUEST_FILENAME}-%1.html [L]
 
# No cookie yet and…
RewriteCond %{HTTP_COOKIE} !thgtp-pre-version=([^;]+)
# … the number of seconds of the time right now is even
RewriteCond %{TIME_SEC} [02468]$
# Then serve the page A and store "a" in a cookie
RewriteRule ^the-hacker-guide-to-python$ %{REQUEST_FILENAME}-a.html [cookie=thgtp-pre-version:a:julien.danjou.info,L]
 
# No cookie yet and…
RewriteCond %{HTTP_COOKIE} !thgtp-pre-version=([^;]+)
# … the number of seconds of the time right now is odd
RewriteCond %{TIME_SEC} [13579]$
# Then serve the page B and store "b" in a cookie
RewriteRule ^the-hacker-guide-to-python$ %{REQUEST_FILENAME}-b.html [cookie=thgtp-pre-version:b:julien.danjou.info,L]


With that few lines, it worked flawlessly.

Results

The results were very good, as it worked perfectly. Combined with Google Analytics, I was able to follow the score of each page. It turns out that testing this particular little piece of content of the page was, as expected, really useless. The final score didn't allow to pick any winner. Which also kind of proves that the system worked perfectly.

But it still was an interesting challenge!

April 03, 2014
During the wee hours of the morning, David Faure posted a new mime applications specification which will allow to setup per-desktop default applications, for example, watching films in GNOME Videos in GNOME, but DragonPlayer in KDE. Up until now, this was implemented differently in at least KDE and GNOME, even to the point that GTK+ applications would use the GNOME default when running on a KDE desktop, and vice-versa.

This is made possible using XDG_CURRENT_DESKTOP as implemented in gdm by Lars. This environment variable will also allow implementing a more flexible OnlyShowIn and NotShowIn desktop entry fields (especially for desktops like Unity implemented on top of GNOME, or GNOME Classic implemented on top of GNOME) and desktop-specific GSettings/dconf configurations (again, very useful for GNOME Classic). The environment variable supports applying custom configuration in sequence (first GNOME Classic then GNOME in that example).

Today, Ryan and David discussed the desktop file cache, making it faster to access desktop file data without hitting scattered files. The partial implementation used a custom structure, but, after many kdbus discussions earlier in the week, Ryan came up with a format based on serialised GVariant, the same format as kdbus messages (but implementable without implementing a full GVariant parser).

We also spent quite a bit of time writing out requirements for a filesystem notification to support some of the unloved desktop use cases. Those use cases are currently not supported by either inotify and fanotify.

That will end our face-to-face meeting. Ryan and David led a Lunch'n'Learn in the SUSE offices to engineers excited about better application integration in the desktops irrespective of toolkits.

Many thanks to SUSE for the accommodation as well as hosting the meeting in sunny Nürnberg. Special thanks to Ludwig Nussel for the morning biscuits :)
April 02, 2014
Wednesday, Mittwoch. Half of the hackfest has now passed, and we've started to move onto other discussion items that were on our to-do list.

We discussed icon theme related simplifications, especially for application developers and system integrators. As those changes would extend into bundle implementation, being pretty close to an exploded-tree bundle, we chose to postpone this discussion so that the full solution includes things like .service/.desktop merges, and Intents/Implements desktop keys.

David Herrman helped me out with testing some Bluetooth hardware (which might have involved me trying to make Mario Strikers Charged work in a Wii emulator on my laptop ;)

We also discussed a full-fledged shared inhibition API, and we agreed that the best thing to do would be to come up with an API to implement at the desktop level. The desktop could then proxy that information to other session- and/or system-level implementations.

David Faure spent quite a bit of time cleaning up after my bad copy/pasted build system for the idle inhibit spec (I copied a Makefile with "-novalidate" as an option, and the XML file was full of typos and errors). He also fixed the KDE implementation of the idle inhibit to match the spec.

Finally, I spent a little bit of time getting kdbus working on my machine, as this seemed to trigger the infamous "hidden cursor bug" without fail on every boot. Currently wondering why gnome-shell isn't sending any events at all before doing a VT switch and back.

Due to the Lufthansa strike, and the long journey times, tomorrow is going to be the last day of the hackfest for most us.
DisplayPort 1.2 Multi-stream Transport is a feature that allows daisy chaining of DP devices that support MST into all kinds of wonderful networks of devices. Its been on the TODO list for many developers for a while, but the hw never quite materialised near a developer.

At the start of the week my local Red Hat IT guy asked me if I knew anything about DP MST, it turns out the Lenovo T440s and T540s docks have started to use DP MST, so they have one DP port to the dock, and then dock has a DP->VGA, DP->DVI/DP, DP->HDMI/DP ports on it all using MST. So when they bought some of these laptops and plugged in two monitors to the dock, it fellback to using SST mode and only showed one image. This is not optimal, I'd call it a bug :)

Now I have a damaged in transit T440s (the display panel is in pieces) with a dock, and have spent a couple of days with DP 1.2 spec in one hand (monitor), and a lot of my hair in the other. DP MST has a network topology discovery process that is build on sideband msgs send over the auxch which is used in normal DP to read/write a bunch of registers on the plugged in device. You then can send auxch msgs over the sideband msgs over auxch to read/write registers on other devices in the hierarchy!

Today I achieved my first goal of correctly encoding the topology discovery message and getting a response from the dock:
[ 2909.990743] link address reply: 4
[ 2909.990745] port 0: input 1, pdt: 1, pn: 0
[ 2909.990746] port 1: input 0, pdt: 4, pn: 1
[ 2909.990747] port 2: input 0, pdt: 0, pn: 2
[ 2909.990748] port 3: input 0, pdt: 4, pn: 3

There are a lot more steps to take before I can produce anything, along with dealing with the fact that KMS doesn't handle dynamic connectors so well, should make for a fun tangent away from the job I should be doing which is finishing virgil.

I've ordered another DP MST hub that I can plug into AMD and nvidia gpus that should prove useful later, also for doing deeper topologies, and producing loops.

Also some 4k monitors using DP MST as they are really two panels, but I don't have one of them, so unless one appears I'm mostly going to concentrate on the Lenovo docks for now.

So the recent GNOME 3.12 release has gotten a very positive reception. Since I know that many members of my team has worked very hard on GNOME 3.12 I am delighted to see all the positive feedback the release is getting. And of course it doesn’t hurt having it give us a flying start to the Fedora Workstation effort. Anyway, for the fun of it I tried putting together a set of press quotes, kinda like how they tend to do for computer game advertisements.

  • “GNOME 3.12: Pixel perfect” “GNOME 3 has finally arrived” – The Register
  • “It is the GNOME release I have been waiting for” – Linux Action Show
  • “The Very Exciting GNOME 3.12 Has Been Released” – Phoronix.com
  • “…. a milestone feature update for users …” – eweek.com
  • “The design team has refined everything …” – omgubuntu.co.uk
  • “One of the big Linux desktops is updated” – TheInquirer
  • “High Resolution screens are best managed under Gnome 3.12″ – laptopspirit.fr
  • “One of the most striking innovations..” – Heise.de
  • “has resurrected what was once the darling of the Linux desktop” – TechRepublic.com

Some of the quotes might feel a little out of context, but as I said I did it for fun and if you end up spending time reading GNOME 3.12 articles to verify the quotes, then all the better ;)

Also you should really check out the nice GNOME 3.12 release video that can be found on the GNOME 3.12 release page.

Anyway, I plan on doing a blog post about the Fedora Workstation effort this week and will talk a bit about how GNOME 3.12 and later fits into that.

April 01, 2014

It has been a long time in the making, but I have finally cut a new release of the Transmageddon transcoder application. The code inside Transmageddon has seen some major overhaul as I have updated it to take advantage of new GStreamer APIs and features. New features in this release include:

  • Support files with multiple audio streams, allowing you to transcode them to different codecs or drop them from the new file
  • DVD ripping support. So know you can use your movie DVDs as input in Transmageddon, be aware though that you need to install things like lsdvd and the GStreamer dvdread plugin from gst-plugins-ugly for it to become available. And you probably also want libdvdcss installed to be able to transcode most movie DVDs.
  • Another small feature of the release is that you can now set language information on files with one audio stream inside. I hope to extend this to also work with files that have multiple audio streams. If you rip a DVD with multiple audio streams Transmageddon will preserve the existing audio information, so in that case you shouldn’t need to set the language metadata manually.
  • Enabled VP9 support in the code.

There are some other smaller niceties too, like the use of blue default action buttons to match the GNOME 3 style better and I also switched to new icon designed by Jakub Steiner. There is also an appdata file now, which should make Transmageddon available in a nice way inside the new Fedora Software Installer’

Also there is now an Advanced section on the Transmageddon website explaining how you can create custom presets that allow you to do things like resize the video or change the bitrate of the audio.

And last, but not least here is a screenshot of the new version.
transmageddon-1.0-blue-button

You can download the new version from the Transmageddon website, I will update the version in Fedora soon.

Today, Ryan carried on with writing the updated specification for startup notification.

David Faure managed to get Freedesktop.org specs updated on the website (thanks to Vincent Untz for some chmod'ing), and removed a number of unneeded items in the desktop file specification, with help from Jérôme.

I fixed a number of small bugs in shared-mime-info, as well as preparing for an 8-hour train ride.

Lars experimented with technics to achieve a high score at 2048, as well as discussing various specifications, such as the possible addition of an  XDG_CURRENT_DESKTOP envvar. That last suggestion descended into a full-room eye-rolling session, usually when xdg-open code was shown.
So the release of the 3.14 linux kernel already happended and I'm a bit late for our regular look at what cool stuff will land in the 3.15 merge window for the Intel graphics driver.

The first big thing is Ben Widawsky's support for per-process address spaces. Since a long time we've already supported the per-process gtt page tables the hardware provides, but only with one address space. With Ben's work we can now manage multiple address spaces and switch between them, at least on Ivybridge and Haswell. Support for Baytrail and Broadwell for this feature is still in progress. This finally allows multiple different users to use the gpu concurrently without accidentally leaking information between applications. Unfortunately there have been a few issues with the code still so we had to disable this for 3.15 by default.

Another really big feature was the much more fine-grained display power domain handling and a lot of runtime power management infrastructure work from Imre and Paulo. Intel gfx hardware always had lots of automatic clock and power gating, but recent hardware started to have some explicit power domains which need to be managed by the driver. To do that we need to keep track of the power state of each domain, and if we need to switch something on we also need to restore the hardware state. On top of that every platform has it's own special set of power domains and how the logical pieces are split up between them also changes. To make this all manageable we now have an extensive set of display power domains and functions to handle them as abstraction between the core driver code and the platform specific power management backend. Also a lot of work has happened to allow us to reuse parts of our driver resume/suspend code for runtime power management. Unfortunately the patches merged into 3.15 are all just groundwork, new platforms and features will only be enabled in 3.16.

Another long-standing nuisance with our driver was the take-over from the firmware configuration. With the reworked modesetting infrastructure we could take over the output routing, and with the experimental i915.fastboot=1 option we could eschew the initial modeset. This release Jesse provided another piece of the puzzle by allowing the driver to inherit the firmware framebuffer. There's still more work to do to make fastboot solid and enable it by default, but we now have all the pieces in place for a smooth and fast boot-up.

There  have been tons of patches for Broadwell all over the place. And we still have some features which aren't yet fully enabled, so there will be lots more to come. Other smaller features in 3.15 are improved support for framebuffer compression from Ville, again more work still left. 5.4GHz DisplayPort support, which is a required to get 4k working. Unfortunately most 4k DP monitors seem to expose two separate screens and so need a adriver with working MST (multi-stream support) which we don't yet support. On that topic: The i915 driver now uses the generic DP aux helpers from Thierry Redding. Having shared code to handle all the communication with DP sinks should help a lot in getting MST off the ground. And finally I'd like to highlight the large cursor support, which should be especially useful for high-dpi screens.


And of course there's been fixes and small improvements all over the place, as usual.
March 31, 2014
I'm in Nürnberg this week for the Freedesktop Hackfest, aka the XDG Summit, aka the XDG Hackfest aka... :)

We started today with discussions about desktop actions, and how to implement them, such as whether showing specific "Edit" or "Share" sub-menus and how to implement them. We decided that that could be implemented through specific desktop keys which a file manager could use. This wasn't thought to be generally useful to require a specification for now.

The morning is stretching to discuss "splash screens". A desktop implementor running on low-end hardware is interested in having a placeholder window show up as soon as possible, in some cases even before the application has linked and the toolkit is available. This discussion is descending into slightly edge cases, such as text editors launching either new windows or new tabs depending on a number of variables.

Specific implementation options were discussed after a nice burrito lunch. We've decided that the existing X11 startup notification would be ported to D-Bus, using signals instead of X messages. Most desktop shells would support both versions for a while. Wayland clients that want startup notification would be required to use the D-Bus version of the specification. In parallel, we would start passing workspace information along with the DESKTOP_STARTUP_ID envvar/platform data.

Jérôme, David and I cleared up a few bugs in shared-mime-info towards the end of the day.

Many thanks to SUSE for the organisation, and accommodation sponsorship.

Update: Fixed a typo
March 30, 2014

Fifteen years ago today, March 29, 1999, I showed up at Sun’s MPK29 building for my first day of work as a student intern. I was off-cycle from the normal summer interns, since I wasn’t enrolled Spring semester but was going to finish my last two credits (English & Math) during the Summer Session at Berkeley. A friend from school convinced his manager at Sun to give me a chance, and after interviewing me, they offered to bring me on board for 3 months full-time, then dropping down to half-time when classes started.

I joined the Release Engineering team as a tools developer and backup RE in what was then the Power Client Desktop Software organization (to differentiate it from the Thin Client group) in the Workstation Products Group of Sun’s hardware arm. The organization delivered the X Window System, OpenWindows, and CDE into Solaris, and was in the midst of two big projects for the Solaris 7 8/99 & 11/99 releases: a project called “SolarEZ” to make the CDE desktop more usable and to add support for syncing CDE calendar & mail apps with Palm Pilot PDAs; and the project to merge the features from X11R6.1 through X11R6.4 (including Xinerama, Display Power Management, Xprint, and LBX) into Sun’s proprietary X11 fork, which was still based on the X11R6.0 release.

I started out with some simple bug fixes to learn the various build systems, before starting to try to write the scripts they wanted to simplify the builds. My first bug was to find out why every time they did a full build of Sun’s X gate they printed a copy of the specs. (To save trees, they’d implemented a temporary workaround of setting the default printer queue on the build machines to one not connected to a printer, but then they had to go delete all the queued jobs every few days to free disk space.) After a couple days of digging, and learning far too much about Imake for my own good, I found the cause was a merge error in the Imake configuration specs. The TroffCmd setting for how to convert troff documents to PostScript had been resynced to the upstream setting of psroff, but the flags were still the ones for Sun’s troff. These flags to Sun’s troff generated a PostScript file on disk, while they made psroff send the PostScript to the printer - switching TroffCmd back to Solaris troff solved the issue, so I filed Sun bug 4227466 and made my first commit to the Solaris X11 code base.

Six months later, at the end of my internship, diploma in hand, Sun offered to convert my position to a regular full-time employee, and I signed on. (This is why I always get my anniversary recognition in September, since they don’t count the internship.) Six months after that, the X11R6.4 project was finishing up and several of the X11 engineers decided to move on, making an opening for me to switch to the X11 engineering team as a developer. Like many junior developers joining new teams, I started out doing a lot of bug fixes, and some RFE’s, such as integrating Xvfb & Xnest into Solaris 9.

A couple years in, Sun’s attempts to reach agreement amongst all of the CDE co-owners to open source CDE had failed, resulting only in the not-really-open-source release of OpenMotif, so Sun chose to move on once again, to the GNOME Desktop. This required us to provide a lot of infrastructure in the X server and libraries that GNOME clients needed, and I got to take on some more challenging projects such as trying to port the Render extension from XFree86 to Xsun, assisting on the STSF font rendering project, adding wheel mouse support to Xsun, and working to ensure Xsun had the underlying support needed by GNOME’s accessibility projects. Unfortunately, the results of some of those projects were not that good, but we learned a lot about how hard it was to retrofit Xsun with the newly reinvigorated X11 development work and how maintaining our own fork of all the shared code was simply slowing us down.

Then one fateful day in 2004, our manager was on vacation, so I got a call from the director of the Solaris x86 platform team, who had been meeting with video card vendors to decide what graphics to include in Sun’s upcoming AMD64 workstations. The one consistent answer they’d gotten was that the time and cost to produce Solaris drivers went up and the feature set went down if they had to port to Xsun as well, instead of being able to reuse the XFree86 driver code they were already shipping for Linux. He asked what it would take to ship XFree86 on Solaris instead, and we discussed the options, then I talked with the other X engineers, and soon I was the lead of a project to replace the Solaris X server.

This was right at the time XFree86 was starting to come apart while the old X.Org industry consortium was trying to move X11 to an open development model, resulting in the formation of the X.Org Foundation. We chose to just go straight to Xorg, not XFree86, and not to fork as we’d done with Xsun, but instead to just apply any necessary changes as patches, and try to limit those to not changing core areas, so it was easier to keep up with new releases.

Thus, right at the end of the Solaris 10 development cycle we integrated the Xorg 6.8 X server for x86 platforms, and even made it the default for that release. We had no SPARC drivers ported yet, just all the upstream x86 drivers, so only provided Xsun in the SPARC release. The SPARC port, along with a 64-bit build for x64 systems, came in later Solaris 10 update releases. This worked out much better than our attempts to retrofit Xsun ever did.

Along with the Solaris 10 release, Sun announced that Solaris was going to be open sourced, and eventually as much of the OS source as we could release would be. But for the Solaris 10 release we only had time to add the Xorg server and the few libraries we needed for the new extensions. Most of the libraries and clients were still using the code from Sun’s old X11R6 fork, and we needed to figure out which changes we could publish and/or contribute back upstream. We weren’t ready to do this yet on the opening day of OpenSolaris.org, but I put together a plan for us to do so going forward, combing the tasks of migrating from our fork to a patched upstream code base, and the work of excising proprietary bits so it could be released as fully open source.

Due to my work with the X.Org open source community, my participation in the existing Solaris communities on Usenet and Yahoo Groups, and interest in X from the initial OpenSolaris participants, I was pulled into the larger OpenSolaris effort, culminating in serving 2 years on the OpenSolaris Governing Board. When the OpenSolaris distro effort (“Project Indiana”) kicked off, I got involved in it to figure out how to integrate the X11 packages to it. Between driving the source tree changes and the distro integration work for X, I became the Tech Lead for the Solaris X Window System for the Solaris 11 release.

Of course, the best laid plans are always beset by changes in the surrounding world, and these were no different. While we finished the migration to a pure open-source X code base in November 2009, it was shortly followed by the closing of Oracle’s acquisition of Sun, which eventually led to the shutdown of the OpenSolaris project. Since the X11 code is still open source and mostly external sources, it is still published on solaris-x11.java.net, alongside other open source code included in Solaris. So when Solaris 11 was released in November 2011, while the core OS code was closed again, it included the open source X11 code with the culmination of our efforts.

Solaris 11 also included a few bug fixes I’d found via experimenting with the Parfait static analysis tool created by Sun Labs. Since shortly after starting in the X11 group, I’d been handling the security bugs in X. Having tools to help find those seemed like a good idea, and that was reinforced when we used Coverity to find a privilege escalation bug in Xorg. When the Sun Labs team came to demo parfait to us, I decided to try it out, and found and fixed a number of bugs in X with it (though not in security sensitive areas, just places that could cause memory leaks or crashes in code not running with raised privileges).

After the Oracle acquisition, part of adopting Oracle’s security assurance policies was using static analysis on our code base. With my experience in using static analyzers on X, I helped create the overall Solaris plan for using static analysis tools on our entire code base. This plan was accepted and I was asked to take on the role of leading our security assurance efforts across Solaris.

Another part of integrating Sun into Oracle was migrating all the bugs from Sun’s bug database into Oracle’s, so we could have a unified system, allowing our Engineered Systems teams to work together more productively, tracking bugs from the hardware through the OS up to the database & application level in a single system. Valerie Fenwick & Scott Rotondo had managed the Solaris bug tracking for years, and I’d worked with them, representing both the X11 and larger Desktop software teams. When Scott decided to move to the Oracle Database Security team, Valerie asked me to replace him, just as the effort to migrate the bugs was beginning. That turned into 2 years of planning, mapping, updating tools and processes, coordinating changes across the organization, traveling to multiple sites to train our engineers on the Oracle tools, and lots of communication to prepare our teams for the changes.

As we looked to the finish line of that work, planning was beginning for the Solaris 12 release. All of the above experiences led to me being named as the Technical Lead for the Solaris 12 release as a whole, covering the entire OS, coordinating and working with the tech leads for the individual consolidations. We’re two years into that now, and while I can’t really say much more yet than the official roadmap, I think we’ve got some solid plans in place for the OS.

So here it is now 15 years after starting at Sun, 14 after joining the X11 team. I still do a little work on X in my spare time (especially around the area of X security bugs), and officially am still in the X11 group, but most of my day is spent on the rest of Solaris now, between planning Solaris 12, managing our bug tracking, and ensuring our software is secure.

In those 15 years, I’ve learned a lot, accomplished a bit, and made a lot of friends. I’ve also:

  • filed 2572 bugs & enhancement requests in the Sun/Oracle bug database, and fixed or closed 2155, assuming I got the bug queries correct.
  • worked under 3 CEO’s (Scott, Jonathan, Larry), and 4 managers, but more VP’s and Directors than I can count thanks to Sun’s love of regular reorgs.
  • been part of the Workstation Products Group, the Webtop Applications group, the Software Globalization group, the User Experience Engineering group, the x64 Platform group, the Solaris Open Source group, and a few I’ve forgotten, again thanks to Sun’s love of regular reorgs.
  • been seen by my co-workers in a suit exactly 4 times (my initial job interview, and funerals for 3 colleagues we’ve lost over the years).

Where will I be in 15 more years? Hard to guess, but I bet it involves making sure everything is ready for Y2038, so I can retire in peace when that arrives.

For now, I offer my heartfelt thanks to everyone whose help made the last 15 years possible - none of it was done alone, and I couldn't have got here without a lot of you.

March 28, 2014

Those running Fedora Rawhide or GNOME 3.12 may have noticed that there is no Xorg.log file anymore. This is intentional, gdm now starts the X server so that it writes the log to the systemd journal. Update 29 Mar 2014: The X server itself has no capabilities for logging to the jornal yet, but no changes to the X server were needed anyway. gdm merely starts the server with a /dev/null logfile and redirects stdin/stderr to the journal.

Thus, to get the log file use journalctl, not vim, cat, less, notepad or whatever your $PAGER was before.

This leaves us with the following commands.


journalctl -e /usr/bin/Xorg
Which would conveniently show something like this:

Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "wacom"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Lenovo Optical USB Mouse: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Integrated Camera: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Sleep Button: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Video Bus: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (II) evdev: Power Button: Close
Mar 25 10:48:41 yabbi Xorg[5438]: (II) UnloadModule: "evdev"
Mar 25 10:48:41 yabbi Xorg[5438]: (EE) Server terminated successfully (0). Closing log file.
The -e toggle jumps to the end and only shows 1000 lines, but that's usually enough. journalctl has a bunch more options described in the journalctl man page. Note the PID in square brackets though. You can easily limit the output to just that PID, which makes it ideal to attach to the log to a bug report.

journalctl /usr/bin/Xorg _PID=5438
Previously the server kept only a single backup log file around, so if you restarted twice after a crash, the log was gone. With the journal it's now easy to extract the log file from that crash five restarts ago. It's almost like the future is already here.

appstream-logoWith the release of GNOME-Software 3.12, it’s also time for another update about what is going on behind the scenes of Appstream, the project providing all the data needed to make software-centers work. So here is the long overdue update :)

If you have attended my FOSDEM talk or seen the slides, you know about the concept of “component metadata” to describe (almost) all software components which make up a Linux system, as well as their public interfaces they provide for users and other software to access. This metadata specification was originally designed as part of the Listaller project for use with the 3rd-party software installer.
But we already have the Appstream specification, which describes applications available on a distribution. And technically, applications are nothing but specialized components.
When I was asked about extending the Appstream spec with fonts and codecs, I decided to give up the Listaller components concept and merge it with the Appstream project, effectively creating a single source of metadata for all software in distributions.

Appstream and components were separate, because Appstream was designed just for application metadata shown in software-centers, while component-metadata was more technical and not user-visible. This separation, however, did not reflect reality. Software centers want to show things components were designed for (codecs, fonts, …), and application meta-information also can contain technical items not shown in a software-center, but useful for other functionality (e.g. for mimehandler finding). So, if you like clear naming of things (like I do), you can now think of the upcoming Appstream release 0.6 as Appstream not being a stream of application metadata, but rather metadata provided for use in applications. ;-)

Extending Appstream with components has a number of benefits:
First of all, users will soon get systems which are able to automatically install missing firmware, codecs, fonts and even libraries or Python-modules. This happens in a distro-agnostic way, so any application can request installations of missing components through Appstream and PackageKit without having to care about distribution details.

For distributors, you will get more information about the software upstream provides: Which public interfaces does it make available? What is the upstream’s homepage? Summary? Short release-descriptions? Exposed library symbols? etc. This data was available before, of course, but we made it machine-readable and standardized now. This will also allow us to match software across distributions quickly, allowing efficient exchange of patches between distributions, and answering questions upstream has like “in which distribuions is my software available, and in which version?”. Tracking upstream software is also easier now. Last but not least, the metadata format gives upstreams a small say in how their software gets packaged (= how it gets split / if there is a metapackage depending on the split parts).

For upstream authors, it is much easier to maintain just a few Appstream XML metafiles, instead of adding a Listaller component file as well (which was not XML). Most data for creating the metainfo files is already available, some can even be filled in automatically by the build-system.

With all the advantages this new system has comes a small disadvantage: These changes broke the backwards-compatibility of the Appstream spec, so to use Appstream data in version 0.6 or higher, you need to adjust your parsers. But of course you are using a library to access the data anyway, which already does that for you ;-) GNOME-Software 3.12 has almost complete support for the new (but still not released) Appstream 0.6 spec through it’s libappstream-glib library. The libappstream library in Appstream’s Git branch will get 0.6 support as well very soon. The only thing you need to do is adjust your code for the new library version, which broke it’s API as well due to a rewrite in C (I hope that we can provide language bindings again very soon).
Soon, there will also be a Qt library for accessing Appstream, since some people expressed a demand for it (and working with GObject in Qt C++ code always feels a litle bit odd).
The 0.6 specification is still being fine-tuned to iron out some issues which might come up and mainly to improve the documentation and adjust all tools. It will be released in 2-3 weeks, and bring not only components support, but also lots of other small improvements and additions for better description of applications/components. In some cases, tag names have changed to have less ambiguous names.

Thanks to Richard Hughes for helping with the spec – our discussions about the Appstream spec are highly productive and in the end lead to a better specification. Also, the new 0.6 specification includes many extensions used in GNOME-Software by default now, which makes it easier for other distributions to provide them for GS.

The current draft for the Appstream 0.6 specifications can be found here. Some details might still change before the final release. As soon as the spec is final, there will be a ML announcement for distributors as well (watch the XDG distributions list), so there is enough time for distributors to future-proof their Appstream distro-data generators (I will also describe the changes more detailed then).

If you are confused now about the whole components stuff: Fear not! ;-) I will do some more detailed posts in a few weeks explaining upstream projects how to provide metadata properly (after the 0.6 release). I will also finish the wiki page for Appstream AppData in KDE projects, where we can then use 0.6-style data in KDE right away.

March 26, 2014
1 new GNOME Videos, 1 updated Bluetooth panel, 2 new thumbnailers, 9 grilo sources, and 1 major UPower rework.

I'm obviously very attached to the GNOME Videos UI changes, the first major UI rework in its 12-year existence.


GNOME Videos watching itself

Go read the press release and follow to the release notes.
March 25, 2014

And done! It took me just 8 months to do this entire book project around Python. From the first day I started writing to today, where I finally publish and sell – almost entirely – myself this book. I'm really proud of what I've achieved so far, as this was something totally new to me.

Doing all of that has been a great adventure, and I'll promise I'll write something about that later on. A making of.

For now, you can enjoy reading the book and learn a bit more about Python. I really hope it'll help you bring your Python-fu to a new level, and help you build great projects!

Go check it out, and since this is first day of sale, enjoy 20% off by using the offer code THGTP20.

March 24, 2014

For a longer story of this issue, please read Adam Williamson's post. The below is the gist of it, mostly for archival purposes.

The Dell XPS13 (not the current Haswell generation, the ones before) ships with a touchpad that identifies as "CyPS/2 Cypress Trackpad". This touchpad is, by the looks of it, a ClickPad and identifies itself as such by announcing the INPUT_PROP_BUTTONPAD evdev property. In the X.Org synaptics driver we enable a couple of features for those touchpads, the most visible of which is the software-emulated button areas. If you have your finger on the bottom left and click, it's a left click, on the bottom right it's a right click. The size and location of these areas are configurable in the driver but also trigger a couple of other behaviours, such as extra filters to avoid erroneous pointer movements.

The Cypress touchpad is different: it does the button emulation in firmware. A normal clickpad will give you a finger position and a BTN_LEFT event on click. The Cypress touchpads will simply send a BTN_LEFT or BTN_RIGHT event depending where the finger is located, but no finger position. Only once you move beyond some threshold will the touchpad send a finger position. This caused a number of issues when using the touchpad.

Fixing this is relatively simple: we merely need to tell the Cypress that it isn't a clickpad and hope that doesn't cause it some existential crisis. The proper way to do this is this kernel patch here by Hans de Goede. Until that is in your kernel, you can override it with a xorg.conf snippet:


Section "InputClass"+Section "InputClass"
Identifier "Disable clickpad for CyPS/2 Cypress Trackpad"
MatchProduct "CyPS/2 Cypress Trackpad"
MatchDriver "synaptics"
Option "ClickPad" "off"
EndSection
This snippet is safe to ship as a distribution-wide quirk.

March 22, 2014

Core Rendering with Glamor

I’ve hacked up the intel driver to bypass all of the UXA paths when Glamor is enabled so I’m currently running an X server that uses only Glamor for all rendering. There are still too many fall backs, and performance for some operations is not what I’d like, but it’s entirely usable. It supports DRI3, so I even have GL applications running.

Core Rendering Status

I’ve continued to focus on getting the core X protocol rendering operations complete and correct; those remain a critical part of many X applications and are a poor match for GL. At this point, I’ve got accelerated versions of the basic spans functions, filled rectangles, text and copies.

GL and Scrolling

OpenGL has been on a many-year vendetta against one of the most common 2D accelerated operations — copying data within the same object, even when that operation overlaps itself. This used to be the most performance-critical operation in X; it was used for scrolling your terminal windows and when moving windows around on the screen.

Reviewing the OpenGL 3.x spec, Eric and I both read the glCopyPixels specification as clearly requiring correct semantics for overlapping copy operations — it says that the operation must be equivalent to reading the pixels and then writing the pixels. My CopyArea acceleration thus uses this path for the self-copy case. However, the ARB decided that having a well defined blt operation was too nice to the users, so the current 4.4 specification adds explicit language to assert that this is not well defined anymore (even in the face of the existing language which is pretty darn unambiguous).

I suspect we’ll end up creating an extension that offers what we need here; applications are unlikely to stop scrolling stuff around, and GPUs (at least Intel) will continue to do what we want. This is the kind of thing that makes GL maddening for 2D graphics — the GPU does what we want, and the GL doesn’t let us get at it.

For implementations not capable of supporting the required semantic, someone will presumably need to write code that creates a temporary copy of the data.

PBOs for fall backs

For operations which Glamor can’t manage, we need to fall back to using a software solution. Direct-to-hardware acceleration architectures do this by simply mapping the underlying GPU object to the CPU. GL doesn’t provide this access, and it’s probably a good thing as such access needs to be carefully synchronized with GPU access, and attempting to access tiled GPU objects with the CPU require either piles of CPU code to ‘de-tile’ accesses (ala wfb), or special hardware detilers (like the Intel GTT).

However, GL does provide a fairly nice abstraction called pixel buffer objects (PBOs) which work to speed access to GPU data from the CPU.

The fallback code allocates a PBO for each relevant X drawable, asks GL to copy pixels in, and then calls fb, with the drawable now referencing the temporary buffer. On the way back out, any potentially modified pixels are copied back through GL and the PBOs are freed.

This turns out to be dramatically faster than malloc’ing temporary buffers as it allows the GL to allocate memory that it likes, and for it to manage the data upload and buffer destruction asynchronously.

Because X pixmaps can contain many X windows (the root pixmap being the most obvious example), they are often significantly larger than the actual rendering target area. As an optimization, the code only copies data from the relevant area of the pixmap, saving considerable time as a result. There’s even an interface which further restricts that to a subset of the target drawable which the Composite function uses.

Using Scissoring for Clipping

The GL scissor operation provides a single clipping rectangle. X provides a list of rectangles to clip to. There are two obvious ways to perform clipping here — either perform all clipping in software, or hand each X clipping rectangle in turn to GL and re-execute the entire rendering operation for each rectangle.

You’d think that the former plan would be the obvious choice; clearly re-executing the entire rendering operation potentially many times is going to take a lot of time in the GPU.

However, the reality is that most X drawing occurs under a single clipping rectangle. Accelerating this common case by using the hardware clipper provides enough benefit that we definitely want to use it when it works. We could duplicate all of the rendering paths and perform CPU-based clipping when the number of rectangles was above some threshold, but the additional code complexity isn’t obviously worth the effort, given how infrequently it will be used. So I haven’t bothered. Most operations look like this:

Allocate VBO space for data

Fill VBO with X primitives

loop over clip rects {
    glScissor()
    glDrawArrays()
}

This obviously out-sources as much of the problem as possible to the GL library, reducing the CPU time spent in glamor to a minimum.

A Peek at Some Code

With all of these changes in place, drawing something like a list of rectangles becomes a fairly simple piece of code:

First, make sure the program we want to use is available and can be used with our GC configuration:

prog = glamor_use_program_fill(pixmap, gc,
                               &glamor_priv->poly_fill_rect_program,
                               &glamor_facet_polyfillrect);

if (!prog)
    goto bail_ctx;

Next, allocate the VBO space and copy all of the X data into it. Note that the data transfer is simply ‘memcpy’ here — that’s because we break the X objects apart in the vertex shader using instancing, avoiding the CPU overhead of computing four corner coordinates.

/* Set up the vertex buffers for the points */

v = glamor_get_vbo_space(drawable->pScreen, nrect * (4 * sizeof (GLshort)), &vbo_offset);

glEnableVertexAttribArray(GLAMOR_VERTEX_POS);
glVertexAttribDivisor(GLAMOR_VERTEX_POS, 1);
glVertexAttribPointer(GLAMOR_VERTEX_POS, 4, GL_SHORT, GL_FALSE,
                      4 * sizeof (GLshort), vbo_offset);

memcpy(v, prect, nrect * sizeof (xRectangle));

glamor_put_vbo_space(screen);

Finally, loop over the pixmap tile fragments, and then over the clip list, selecting the drawing target and painting the rectangles:

glEnable(GL_SCISSOR_TEST);

glamor_pixmap_loop(pixmap_priv, box_x, box_y) {
    int nbox = RegionNumRects(gc->pCompositeClip);
    BoxPtr box = RegionRects(gc->pCompositeClip);

    glamor_set_destination_drawable(drawable, box_x, box_y, TRUE, FALSE, prog->matrix_uniform, &off_x, &off_y);

    while (nbox--) {
        glScissor(box->x1 + off_x,
                  box->y1 + off_y,
                  box->x2 - box->x1,
                  box->y2 - box->y1);
        box++;
        glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, nrect);
    }
}

GL texture size limits

X pixmaps use 16 bit dimensions for width and height, allowing them to be up to 65536 x 65536 pixels. Because the X coordinate space is signed, only a quarter of this space is actually useful, which makes the useful size of X pixmaps only 32767 x 32767. This is still larger than most GL implementations offer as a maximum texture size though, and while it would be nice to just say ‘we don’t allow pixmaps larger than GL textures’, the reality is that many applications expect to be able to allocate such pixmaps today, largely to hold the ever increasing size of digital photographs.

Glamor has always supported large X pixmaps; it does this by splitting them up into tiles, each of which is no larger than the largest texture supported by the driver. What I’ve added to Glamor is some simple macros that walk over the array of tiles, making it easy for the rendering code to support large pixmaps without needing any special case code.

Glamor also had some simple testing support — you can compile the code to ignore the system-provided maximum texture size and supply your own value. This code had gone stale, and couldn’t work as there were parts of the code for which tiling support just doesn’t make sense, like the glyph cache, or the X scanout buffer. I fixed things so that you could leave those special cases as solitary large tiles while breaking up all other pixmaps into tiles no larger than 32 pixels square.

I hope to remove the single-tile case and leave the code supporting only the multiple-tile case; we have to have the latter code, and so having the single-tile code around simply increases our code size for not obvious benefit.

Getting accelerated copies between tiled pixmaps added a new coordinate system to the mix and took a few hours of fussing until it was working.

Rebasing Many (many) Times

I’m sure most of us remember the days before git; changes were often monolithic, and the notion of changing how the changes were made for the sake of clarity never occurred to anyone. It used to be that the final code was the only interesting artifact; how you got there didn’t matter to anyone. Things are different today; I probably spend a third of my development time communicating how the code should change with other developers by changing the sequence of patches that are to be applied.

In the case of Glamor, I’ve now got a set of 28 patches. The first few are fixes outside of the glamor tree that make the rest of the server work better. Then there are a few general glamor infrastructure additions. After that, each core operation is replaced, one a at a time. Finally, a bit of stale code is removed. By sequencing things in a logical fashion, I hope to make review of the code easier, which should mean that people will spend less time trying to figure out what I did and be able to spend more time figuring out if what I did is reasonable and correct.

Supporting Older Versions of GL

All of the new code uses vertex instancing to move coordinate computation from the CPU to the GPU. I’m also pulling textures apart using integer operations. Right now, we should correctly fall back to software for older hardware, but it would probably be nicer to just fall back to simpler GL instead. Unless everyone decides to go buy hardware with new enough GL driver support, someone is going to need to write simplified code paths for glamor.

If you’ve got such hardware, and are interested in making it work well, please take this as an opportunity to help yourself and others.

Near-term Glamor Goals

I’m pretty sure we’ll have the code in good enough shape to merge before the window closes for X server 1.16. Eric is in charge of the glamor tree, so it’s up to him when stuff is pulled in. He and Markus Wick have also been generating code and reviewing stuff, but we could always use additional testing and review to make the code as good as possible before the merge window closes.

Markus has expressed an interest in working on Glamor as a part of the X.org summer of code this year; there’s clearly plenty of work to do here, Eric and I haven’t touched the render acceleration stuff at all, and that code could definitely use some updating to use more modern GL features.

If that works as well as the core rendering code changes, then we can look forward to a Glamor which offers GPU-limited performance for classic X applications, without requiring GPU-specific drivers for every generation of every chip.

March 21, 2014

I don’t know if I’ve ever eaten my own dogfood that smells this risky.

A few days ago, I published patches to support dynamic page table allocation and tear-down in the i915 driver http://lists.freedesktop.org/archives/intel-gfx/2014-March/041814.html. This work will eventually help us support expanded page tables (similar to how things work for normal Linux page tables). The patches rely on using full PPGTT support, which still requires some work to get enabled by default. As a result, I’ll be carrying around this work for quite a while. The patches provide a lot of opportunity to uncover all sorts of weird bugs we’ve never seen due to the more stressful usage of the GPU’s TLBs. To avoid the patches getting too stale, and to further the bug extermination, I’ve figured, why not run it myself?

If you feel like some serious pain, or just want to help me debug it, give it a go – there should be absolutely no visible gain for you, only harm. You can either grab the patches from the mailing list, patchwork, or my branch.  Make sure to turn on full PPGTT support with i915.enable_ppgtt=2. If you do decide to opt for the pain, you can take comfort in the fact that you’re helping get the next big piece of prep work in place.

The question is, how long before I get sick of this terrible dogfood? I’m thinking by Monday I’ll be finished :D

March 20, 2014

We've had xorg.conf.d snippets for quite a while now (released with X Server 1.8 in April 2010). Many people use them as a single configuration that's spread across multiple files, but they also can be merged and rely on each other. The order they are applied is the lexical sort order of the directory, so I recommend always prefixing them with a number. But even within a single snippet, you can rely on the stacking. Let me give you an example:


$ cat /usr/share/X11/xorg.conf.d/10-evdev.conf
Section "InputClass"
Identifier "evdev touchpad catchall"
MatchIsTouchpad "on"
MatchDevicePath "/dev/input/event*"
Driver "evdev"
EndSection

$ cat /usr/share/X11/xorg.conf.d/50-synaptics.conf
Section "InputClass"
Identifier "touchpad catchall"
MatchIsTouchpad "on"
MatchDevicePath "/dev/input/event*"
Driver "synaptics"
EndSection
The first one applies the evdev driver to anything that looks like a touchpad. The second one, sorted later, overwrites this setting with the synaptics driver. Now, the second file also has a couple of other options:

Section "InputClass"
Identifier "Default clickpad buttons"
MatchDriver "synaptics"
Option "SoftButtonAreas" "50% 0 82% 0 0 0 0 0"
EndSection
This option says "if the device is assigned the synaptics driver, merge the SoftButtonAreas option". And we have another one, from ages ago (which isn't actually needed anymore):

Section "InputClass"
Identifier "Disable clickpad buttons on Apple touchpads"
MatchProduct "Apple Wireless Trackpad"
MatchDriver "synaptics"
Option "ClickPad" "on"
EndSection
This adds on top of the other two, provided your device has the name and is assigned the synaptics driver.

The takeaway of this is that when you have your own xorg.conf.d snippet, there is almost never a need for you to write more than a 5-line snippet merging exactly that one or two options you want. Let the system take care of the rest.

To give the wider community a chance to see what happened during the GStreamer hackfest last weekend I put together this blog post is based on an summary written by Wim Taymans, so a big thanks to Wim for letting me reuse parts of his summary.)

So last weekend 21 GStreamer hackers got together at the Google office in Munich to spend the weekend hacking on their favourite GStreamer bits. At this point in time we didn’t have any major basic plumbing tasks that needed tackling so the time was spent hacking on a lot of different projects and using the opportunity to discuss design and challenges with each other.

We where 3 people attending from Red Hat and Fedora; Wim Taymans, Alberto Ruiz and myself.

With the Release of GStreamer 1.0 in September 2012, we drastically changed the way memory is handled in the multimedia pipeline and the large body of work is still in exploring, improving and porting elements to this new memory model. We’re also mostly working on improving the existing elements with comparatively little new infrastructure work.

We’re also seeing a lot of people from different companies that contribute significant amounts of code to the official GStreamer repositories. This has traditionally been a much more closed effort with various pieces of code living in multiple repositories, especially for the hardware acceleration bits. It is good to see that the 1.0 series brings all these efforts together again with more coordination and a more coherent story.

HW acceleration

One of the large ongoing tasks is to improve our support for hardware accelerated decoding, effects and display. With 1.0 we can finally get this done cleanly and efficiently in very many use cases.

Matthew Waters to flew in from Australia to move the gst-plugins-gl set of plugins to the core GStreamer plugins packages. He has been working on these plugins for a while now. Their goal is to use OpenGL to apply operations on the video, like rotation on a cube or applying a shader. With the 1.0 memory management it becomes possible to do this efficiently with a minimal amount of texture upload/downloads. More work is needed here, we can optimize things some more by delaying the work and running the shaders as part of the rendering operation.

Andoni Morales (Fluendo) has also been working on improving hardware acceleration on android. He used some of the new features of 1.0 to make the android codecs use zero-copy by implementing the texture-upload meta data on buffers. This allows the video sink to efficiently create a texture from the decoded data for display. Andoni also ported winks, a video capture source on Windows, to GStreamer 1.0.

Nicolas Dufresne (Collabora) has been working on adding a new set of decoders based on the mem2mem API in v4l2. Not many drivers provide this API yet but it is implemented in some Samsung Exynos SOCs. We would also like to support other m2m operations later, such as color conversion but for that we need to make some of our base classes support the required asynchronous behaviour of mem2mem. The memory management in our v4l2 elements has been gone through several iterations of improvements during the 1.0 cycle but it still is not entirely what it should be. We agreed on what we should do to fix this in the near future. We also briefly discussed the need for a new event that can be used to reclaim memory from a pipeline; many elements that use hardware buffers need to free those before they can negotiate a new format with the hardware so we need a way to make that possible.

Mathieu Bourron (Collabora) has been working on libva, the library for GPU based video decoding and encoding on Intel hardware, and spent his time at the Hackfest fixing up the SPU overlay element to enable hardware accelerated subpicture overlays in the video sink. Traditionally GStreamer would use the CPU to overlay the subpictures (of DVD, for example) on top of the video images. With new GL-based sinks, and hardware accelerated decoders this is very undesirable and can be done much more efficient as part of the final rendering. In 1.0 we have the infrastructure to delay this overlay operation by attaching extra metadata (with the subpicture) to the video images when the video sink knows how to overlay them. We have been doing this with subtitles in cluttersink and other sinks for a while now and soon we can also do this with subpictures.

Plugin Hacking

Arun Raghavan, GStreamer hacker and Pulse Audio maintainer, worked on disabling the audio and video filters in playbin when passthrough mode was selected. In passthrough mode, a video or audio sink can directly handle the encoded media (think a bluetooth headset that can handle mp3 directly or a hardware sink that takes encoded data). He expaned on that work in blog entry.

As a cool hack, Arun also made a source element to read from torrent files so you can watch a movie while you torrent it. He provides more information on that element in his blog, it is actually really cool.

Thiago Santos (Collabora) was continuing with his work to improve the DASH demuxer, reworking the buffering code to make it buffer less and smoother. Dash is one of the new formats (with HLS and MSS) to stream media over HTTP while adapting to bandwidth changes. On the server side, it makes media available in various bitrates while a client switches between bitrates depending on its measured network conditions. Andoni Morales also worked on a new dashsink element that implements the server side of the DASH
format.

Mathieu Duponchelle, a former GSoC student was trying to improve support for seeking in MPEG Transport Streams in order to use them in PiTiVi. Seeking in MPEG TS is not an easy thing because they are really optimized for streaming only. He got help from Thibault Saunier (Collabora), who was also hacking on PiTiVi and who was preparing a new release of gnonlin, GES and gst-python 1.2 (which he released on Sunday). Mathieu is one of the developers able to work fulltime on PiTiVi now thanks to the PiTiVi fundraiser, so be sure to contribute to that!

Jan Schmidt (Centricular), a long time GStreamer core hacker was working on debugging some DVB issues and also ended up taking part in a lot of the general design and troubleshooting discussions happening during the hackfest, helping other people move forward with their projects.

Long time GStreamer hacker Edward Hervey (Collabora) was planning to do a lot of DVB hacking but had to give up on that effort when it was clear that Google had signal isolated the office for security reasons, so there was no DVB signal in the Google office. Instead he worked on merging some pending DVB patches and implemented GAP support in the mpeg transport stream plugin. GAP support deals with streams that have long periods of no media (like missing audio for some time in DVD). It makes sure that downstream elements keeps processing the silence instead of waiting for more data.

Applications

Meg Ford, a GSOC student mentored by Sebastian Dröge (Centricular) was working on Gnome Sound Recorder and fixing up the last bugs, preparing it for a new release.

Myself, Christian Schaller (Red Hat) was on a bug fixing spree in Transmageddon (a transcoding application written in python and GStreamer) and managed to reduce the number of known bugs to only 1. Fixed that last bug once I got home, so now I just need to hammer at Transmageddon for a bit to make sure I caught all the corner case issues so I can do a major new release with new features such as handling files with multiple audio streams, handling DVD ripping, handling VP9 encoding, handling setting audio stream language information, reducing decoding overhead for streams that we are going to throw away and more. Also had help reviewing and cleaning up the Transmageddon code from Alberto Ruiz, freeing Transmageddon from some ugly code that had survived many library updates and rewrites.

Alessandro Decina(Spotify) kept working on his patches to update the Firefox GStreamer backend to GStreamer 1.0. We hope to deploy this work in Fedora in the not to distant future. As a hack for the hackfest he provided patches to implement audio and video capture.

Wim Taymans (Red Hat) was hacking on a new library that can parse and generate MIKEY messages (RFC 3830). He want to use this in the GStreamer RTSP server to negotiate SRTP (secure RTP) encryption parameters.

We had 2 people from the Swedish company AXIS, who provide network cameras that all run GStreamer and who contribute on a regular basis to the RTP and RTSP elements and libraries. Ognyan Tonchev was mostly writing some unit tests for RTSP and multicast handling in the RTSP server. Sebastian Rasmussen had been hacking on our watchdog element and the payloaders.

Infrastructure

Long time GStreamer hacker Stefan Sauer (Google) gave a demo of his idea for a tracing infrastructure in GStreamer. The idea is to place trace macros at strategic places that would send structured data to pluggable tracer modules. Some of the tracer modules could, for example measure CPU usage of a plugin or measure the latency. The idea would be to gradually replace our extensive (but unstructured) logging with this new trace infrastructure. This would allow us to do new interesting things, like send debug log to a remote machine or produce STF (Structured Trace Format) to analyse with standard tools. No immediate plans were made to merge this but there seems to be very little resistance to get this merged soon.

Core hacker Sebastian Dröge (Centricular) has been going over the current Stream selection ideas. One of the long outstanding issues is that of switching streams between different languages: you have a movie in different languages and you want to switch between them. To achieve low-latency old data should be kept around for the streams that are not currently selected and be quickly and sent to the audio device. The idea is a combination of events to select a stream and to have the demuxer seek back in the stream on
switches. No final conclusion or plan that can solve all requirements has been reached yet.

Also investigations have begun to make decodebin deal with renegotiations. For example, when a new stream is selected, we might need to use a different decoder for this stream but also when new input is received, decodebin should be able to reconfigure itself. The decodebin code is a complicated beast so any change to it should be done carefully.

GStreamer maintainer Tim-Philipp Müller (Centricular) spent his time merging the new device probing and monitoring API (written by Olivier Crête from Collabora) that had been sitting in bugzilla for a while now. The purpose is to be able to probe devices and their capabilities such as v4l2 and ALSA devices. It’s also possible to be notified when devices appear and disappear in the system. An implementation for pulseaudio devices and another for v4l2 devices using gudev has been committed as well. This reimplements a
feature that was in 0.10, but got cut from 1.0 due to us not being happy with the old design. One of the complications with that was the fact that we ran out of bits in one of our enums so we needed to find a good solution for that.

We briefly discussed how to implement the SKIP seek flag. This extra flag can be used when doing fast forward or reverse and instructs the decoders that it is allowed to throw away data to more efficiently perform the trick mode (at reduced accuracy). There was a prototype for AVI playback that I implemented once that we discussed a bit. We’ll see if someone takes up the task to finalize this work and implement SKIP mode in more demuxers.

I took some photos during the event to capture the spirit and put them on Google Plus for your viewing pleasure.

A big thank you to Google for hosting us and providing us with free lunch and free drinks through the weekend.

March 19, 2014
So to enable OpenGL 3.3 on radeonsi required some patches backported to llvm 3.4, I managed to get some time to do this, and rebuilt mesa against the new llvm, so if you have an AMD GPU that is supported by radeonsi you should now see GL 3.3.

For F20 this isn't an option as backporting llvm is a bit tricky there, though I'm considering doing a copr that has a private llvm build in it, it might screw up some apps but for most use cases it might be fine.

Update March 19 2014: this post is outdated, please read X.Org synaptics support for the Lenovo T440, T540, X240, Helix, Yoga, X1 Carbon instead.

The T440 has a rather unusual touchpad with the buttons painted on top of the touchpad rather than the bottom. In addition, the separate set of buttons for the trackstick have gone the way of the dodo. Moving the software-emulated buttons up on the touchpad is obviously quite important for trackstick users but it throws up a bunch of problems. There are some limitations with the current synaptics X.Org driver: we can only have one region each designated for the right and the middle button. The rest of the touchpad is a left button click. In the case of the T440, the default Windows config has a right button up the top and another one at the bottom of the touchpad. An ASCII-art of that would look like this:


+----------------------------+
| LLLLLLLLLL MMMMM RRRRRRRRR |
| |
| |
| |
| |
| |
| |
| LLLLLLLL RRRRRRRR |
+----------------------------+
We simply can't do that at the moment, best we can is split the touchpad so that the whole right side is a right-click and a strip in the middle that is a middle click. So the best we can do is:

+----------------------------+
| LLLLLLLLLL MMMMM RRRRRRRRR |
| LLLLLLLLLL MMMMM RRRRRRRRR |
| LLLLLLLLLL MMMMM RRRRRRRRR |
| LLLLLLLLLL MMMMM RRRRRRRRR |
| LLLLLLLLLL MMMMM RRRRRRRRR |
| LLLLLLLLLL MMMMM RRRRRRRRR |
| LLLLLLLLLL MMMMM RRRRRRRRR |
| LLLLLLLLLL MMMMM RRRRRRRRR |
| LLLLLLLLLL MMMMM RRRRRRRRR |
+----------------------------+
I'm working on a solution for the proper config, but for now you'll have to be content with this.

The easiest approach for local configuration is a new InputClass section in the form:


Section "InputClass"
Identifier "t440 top buttons"
MatchDriver "synaptics"
# right btn|middle btn
Option "SoftButtonAreas" "60% 0 0 0 40% 60% 0 0"
EndSection
Drop that into /etc/X11/xorg.conf.d/99-t440-synaptics.conf and you're good to go.

The problem is finding a generic solution to this that we can ship in a distribution. That requires a two-step progress. The touchpads look the same as all others, the only differentiator we have is the DMI information on the box. We can't check that in the xorg.conf snippets yet (Daniel Martin is working on a MatchDMI tag, but it won't happen until server 1.16). For now, we need a udev rule to help the xserver.


ACTION!="add|change", GOTO="touchpad_quirks_end"
KERNEL!="event*", GOTO="touchpad_quirks_end"
ENV{ID_INPUT_TOUCHPAD}!="1", GOTO="touchpad_quirks_end"

ATTR{[dmi/id]product_version}=="*T440*", \
ENV{ID_INPUT.tags}="touchpad_softbutton_top"

LABEL="touchpad_quirks_end"
If our product matches T440, we tag the touchpad, and that tag is something we can match against. Our revised InputClass section now looks like this:

Section "InputClass"
Identifier "t440 top buttons"
MatchDriver "synaptics"
MatchTag "touchpad_softbutton_top"
Option "SoftButtonAreas" "60% 0 0 0 40% 60% 0 0"
EndSection
I've pushed this configuration into Fedora now (rawhide, F20, F19), let's see what the feedback is. Having the whole right-side of the touchpad work as right button may cause a few issues so this is one change I may have to revert in the future.

This is a follow-up to my post from December Lenovo T440 touchpad button configuration. Except this time the support is real, or at least close to being finished. Since I am now seeing more and more hacks to get around all this I figured it's time for some info from the horse's mouth.

[update] I forgot to mention: synaptics 1.8 will have all these, the first snapshot is available here

Lenovo's newest series of laptops have a rather unusual touchpad. The trackstick does not have a set of physical buttons anymore. Instead, the top part of the touchpad serves as software-emulated buttons. In addition, the usual ClickPad-style software buttons are to be emulated on the bottom edge of the touchpad. An ASCII-art of that would look like this:


+----------------------------+
| LLLLLLLLLL MMMMM RRRRRRRRR |
| |
| |
| |
| |
| |
| |
| LLLLLLLL RRRRRRRR |
+----------------------------+
Getting this to work required a fair bit of effort, patches to synaptics, the X server and the kernel and a fair bit of trial-and-error. Kudos for getting all this sorted goes to Hans the Goede, Benjamin Tissoires, Chandler Paul and Matthew Garrett. And in the process of fixing this we also fixed a bunch of other issues that have been plaguing clickpads for a while.

The first piece in the puzzle was to add a second software button area to the synaptics driver. Option "SecondarySoftButtonAreas" now allows a configuration in the same manner as the existing one (i.e. right and middle button). Any click in that software button area won't move the cursor, so the buttons will behave just like physical buttons. Of course, we expect that button area to work out of the box, so we now ship configuration files that detect the touchpad and apply that automatically. This requires an xserver fix and a kernel/udev fix, more on that later.

The second piece in the puzzle was to work around the touchpad firmware. The touchpads speak two protocols, RMI4 over SMBus and PS/2. Windows uses RMI4, Linux still uses PS/2. Apparently the firmware never got tested for PS/2 so the touchpad gives us bogus data for its axis ranges. A kernel fix for this is in the pipe.

Finally, the touchpad needed to be actually usable. So a bunch of patches that tweak the clickpad behaviours were merged in. If a finger is set down inside a software button area, finger movement does no longer affect the cursor. This stops the ever-so-slight but annoying movements when you execute a physical click on the touchpad. Also, there is a short timeout after a click to avoid cursor movement when the user just presses and releases the button. The timeout is short enough that if you do a click-and-hold for drag-and-drop, the cursor will move as expected. If a touch started outside a software button area, we can now use the whole touchpad for movement. And finally, a few fixes to avoid erroneous click events - we'd sometimes get the software button wrong if the event sequence is off.

Another change changed the behaviour of the touchpad when it is disabled through the "Synaptics Off" property. If you use syndaemon to disable the touchpad while typing, the buttons now work even when the touchpad is disabled. If you don't like touchpads at all and prefer to use the trackstick only, use Option "TouchpadOff" "1". This will disable everything but physical clicks on the touchpad.

On that note I'd also like to mention another touchpad bug that was fixed in the recent weeks: plenty of users reported synaptics having a finger stuck after suspend/resume or sometimes even after logging in. This was an elusive bug and finally tracked down to a mishandling of SYN_DROPPED events in synaptics 1.7 and libevdev. I won't provide a fix for synaptics 1.7 but we've fixed libevdev - please use synaptics 1.8 RC1 or later and libevdev 1.1 RC1 or later.

The kernel fix is more complicated. The problem is that while the touchpads all have a unique PNPID, that ID is not easily accessible because it shows up on a device that is not a parent of the device we see in the driver. Hence MatchPnPID doesn't apply. Matthew Garrett had a preliminary patch for this but it turned out to break some other use-case so we're back at square one at the moment. For now, I'm using a couple of udev rules together with the MatchTag configuration:


ATTR{[dmi/id]product_version}=="*T540*", ENV{ID_INPUT.tags}="T540"
and with the matching xorg.conf snippet:

Section "InputClass"
Identifier "Lenovo T540 trackstick software button buttons"
MatchTag "T540"
Option "SecondarySoftButtonAreas" "3363 0 0 2280 2717 3362 0 2280"
EndSection
Oh btw, do you note the magic numbers here? This really should be in percent of the touchpad but we don't have the kernel patch. So for now I just ship the absolute numbers in the Fedora packages and glare pointedly at everyone who thinks that's a permanent solution.

Fedora users: everything is being built in rawhide and I have a F20 Copr that I'll keep pushing to once I get the various other bits for F20 sorted.

In November I was lamenting the lack of selection in credible Haswell-powered laptops for Mesa development. I chose the 15" MacBook Pro, while coworkers picked the 13" MBP and the System76 Galago Pro. After using the three laptops for a few months, I review our choices and whether they panned out like we expected.

  CPU RAM Graphics Screen Storage Battery
13" MacBook Pro 2.8 GHz 4558U 16 GiB GT3 - 1200 MHz 13.3" 2560x1600 512 GiB PCIe 71.8 Wh
15" MacBook Pro 2.0 GHz 4750HQ 16 GiB GT3e - 1200 MHz 15.4" 2880x1800 256 GiB PCIe 95 Wh
Galago Pro 2.0 GHz 4750HQ 16 GiB GT3e - 1200 MHz 14.1" 1920x1080 many options 52 Wh

15" MacBook Pro

The installation procedure on the MacBook was very simple. I shrunk the HFS partition from OS X and installed rEFInd, before following the usual Gentoo installation.

Quirks and Annoyances

Running Linux on the MacBook is a good experience overall, with some quirks:

  • the Broadcom BCM4360 wireless chip is supported by a proprietary driver (net-wireless/broadcom-sta in Gentoo)
  • the high DPI Retina display often necessitates 150~200% zoom (or lots of squinting)
  • the keyboard causes some annoyances:
    • the function keys operate only as F* keys when the function key is held, making common key combinations awkward (behavior can be changed with the /sys/module/hid_apple/parameters/fnmode file).
    • there's no Delete key, and Home/End/Page Up/Page Down are function+arrow key.
    • the power button is a regular key immediately above backspace. It's easy to press accidentally.
  • the cooling fans don't speed up until the CPU temperature is near 100 C.
  • no built-in Ethernet. Seriously, we've reinvented how many mini and micro HDMI and DisplayPort form factors, but we can't come up with a way to rearrange eight copper wires to fit an Ethernet port into the laptop?

Worst Thing: Insufficient cooling

The worst thing about the MacBook is the insufficient cooling. Even forcing the two fans to their maximum frequencies isn't enough to prevent the CPUs from thermal throttling in less than a minute of full load. Most worrying is that my CPU's core #1 seems to run significantly hotter under load that the other cores. It's always the first, and routinely the only, core to reach 100 C, causing the whole CPU package to be throttled until it cools slightly. The temperature gradient across a chip only 177 square millimeters is also troubling: frequently core #1 is 15 C hotter than core #3 under load. The only plausible conclusion I've come to is that the thermal paste isn't applied evenly across the CPU die. And since Apple uses tamper resistant screws I couldn't reapply the thermal paste without special tools (and probably voiding the warranty).

Best Thing: Retina display

I didn't realize how much the Retina display would improve the experience. Having multiple windows (that would have been close to full screen at 1080p) open at once is really nice. Being able to have driver code open on the left half of the screen, and the PDF documentation open on the right makes patch review quicker and more efficient. I've attached other laptops I've used to larger monitors, but I've never even felt like trying with the 15" MBP.

13" MacBook Pro

I consider the 13" MacBook Pro to be strictly inferior (okay, lighter and smaller is nice, but...) to the 15". Other than the obvious differences in the hardware, the most disappointing thing I've discovered about it is that the 13" screen isn't really big enough to be comfortable for development. The coworker that owns it plugs it into his physically larger 1080p monitor when he gets to the office. For a screen that's supposed to be probably the biggest selling point of the laptop, it's not getting a lot of use.

As I mentioned, I'm perfectly satisfied with the 15" screen for everyday development.

System76 Galago Pro

I used the Galago Pro for about three weeks before switching to the 15" MacBook. In total it's a really compelling system, except for the serious lack of attention to detail.

Quirks and Annoyances

  • although it has built-in Ethernet (yay!), the latch mechanism will drive you nuts. Two hands are necessary to unplug an Ethernet cable from it, and three are really recommended.
  • the single hinge attaching the screen feels like a failure point, and the screen itself flexes way too much when you open or close the laptop.
  • all three USB ports at on the right side, which can be annoying if you want to use a mouse, which you will, because...
  • the touchpad doesn't behave very well. In fairness, this is probably mostly the fault of the synaptics driver or the default configuration.

Worst Thing: Keyboard

The keyboard is probably the worst part. The first time I booted the system, typing k while holding the shift key wouldn't register a key press. Lower case k typed fine, but with shift held - nothing. After about 25 presses, it began working without any indication as to what changed.

The key stroke is very short, you get almost no feedback, and if you press the keys at an angle slightly off center they may not register. Typing on it can be a rather frustrating experience. Beyond it being a generally unpleasant keyboard, the function key placement confirms that the keyboard is a complete afterthought: Suspend is between Mute and Volume Down. Whoops!

Best Thing: Cooling

The Galago Pro has an excellent cooling system. Its fans are capable of moving a surprising amount of air and don't make too much noise doing it. Under full load, the CPU's temperature never passed 84 C - 16 C cooler than the 15" MBP (and the MBP doesn't break 100 C only because it starts throttling!). On top of not scorching your lap during compiles, the cooler temperatures mean the CPU and GPU are going to be able to stay in turbo mode longer and give better performance.

Final thoughts

Concerns about the keyboard and general build quality of the Galago Pro turned out to be true. I think it's possible to get used to the keyboard, and if you do I feel confident that the system is really nice to use (well, I guess you have to get used to the other input device too).

I'm overall quite happy with the MacBook Pro. The Retina display is awesome, and the PCIe SSD is incredibly fast. I was most worried about the 15" MacBook overheating and triggering thermal throttling. Unfortunately this was well founded. Other than the quirks, which are par for the course, the overheating issue is the one significant downside to this machine.

March 18, 2014

Students, this Friday at 1900 UTC is the deadline to apply for this year’s GSoC. It’s an awesome program that pays you to work on open-source projects for a summer (where you == a university/college student).

It’s by no means too late, but start your application today. You can find more information on Gentoo’s projects here (click on the Ideas page to get started; also see our application guidelines) and on the broader GSoC program here.

Good luck!


Tagged: community, development, gentoo, gsoc
March 10, 2014

There's a million tutorials out there how to learn git. This isn't one of them. I'm going to assume that you learned git a while ago, you've been using it a bit and you're generally familiar with its principles. I'm going to show is a couple of things that improved my workflow. Chances are, it will improve yours too. This isn't a tutorial though. I'm just pointing you in the direction of things, you'll have to learn how to use them yourself.

Use tig

Seriously. Don't tell me you use gitk or git log is good enough for you. Use tig. tig is to git log what mutt is to mail(1). It has been the source of the biggest efficiency increase for me. Screenshots don't do it justice because the selling point is that it is interactive. But anyway, here are some official screenshots: tig blame shows you the file and the commits, you just need to select the line, hit enter and you see the actual commit. The main view by default shows you tags, branch names, remote branch names, etc. So not only do you immediately know which branch you're on, you will see local branches that have been merged, tags that have been applied, etc. It gives you an awareness that git log doesn't. Do yourself a favour, install it, use it for a day or two and I'm pretty sure you won't go back.

tig also supports custom configurations. Here is my $HOME/.tigrc:


bind generic X !git cherry-pick -x %(commit)
bind generic C !git cherry-pick %(commit)
bind generic R !git revert %(commit)
bind generic E !git format-patch -1 %(commit)
bind generic 0 !git checkout %(commit)
bind generic 9 !git checkout %(commit)~
bind generic A !git commit --amend -s
bind generic S !git show %(commit)
So with a couple of key strokes I can cherry-pick, export patches, revert, check out a single tree, etc. Especially cherry-picking is extremely efficient: check out the target branch, run "tig master", then simply select each commit, it "C" or "X" and done.

Use branches

Anytime it takes you more than 5 minutes to fix an issue, create a new branch. I'm getting torn between multiple things all the time. I may spend a day or two on one bug, then it's back to another, unrelated issue. With the review requirements on some projects I may have multiple patches waiting for feedback, but I can't push them yet. Hence - a branch for each feature/bugfix. master is reserved for patches that can be pushed immediately.

This approach becomes particularly useful for fixes that may need some extra refacturing. You start on a feature-based branch, but halfway through realise you need a few extra patches to refactor things. Those are easy to review so you send them out to gather reviews, then cherry-pick them to master and push. Back to your feature branch, rebase and you're done - you've managed two separate streams of fixes without interference. And most importantly, you got rid of a few patches that you'd otherwise have to carry in your feature branch.

Of course, it takes a while to get used this and it takes discipline. It took me a few times before I really managed to always work like this but the general rule for me is now: if I'm hacking on the master branch, something is off. Remember: there's no real limit to how many branches you can create - just make sure you clean them up when you're done to keep things easy for your brain.

Use the branch names to help you. You can rename branches (git branch -m), so I tend to name anything that's a bigger rewrite with "wip/somefeature" whereas normal bug fixes go on branches with normal names. And because I rebase local feature branches it doesn't matter what I name them anyway, the branches are deleted once I merge them. Branches where I do care about the branch history (i.e. those I pull them into master with a merge commit) I rename before pulling to get rid of the "wip" prefix.

Use branch descriptions

Hands up if you have a "devel" branch from 4 months ago. Hands up if you still remember what the purpose of that branch was. Right, I didn't think so. git branch --edit-description fires up an editor and lets you add a description for the branch. Sometimes a single sentence is enough to refresh your memory. Most importantly: when you task-switch to a different feature, edit the description to note where you left off, what the plan was, etc. This reduces the time to get back to work. git config branch.<branchname>.description shows you the description for the matching branch.

I even have a git hook to nag me when I check out a branch without a description. Note that branch descriptions are local only, they are not pushed to the remote.

Amend and rebase until the cows come home

The general rule: what is committed, doesn't get lost. At least not easily, it is still in the git reflog. So commit when you think you're done. Then review, test, add, and git commit --amend. That typo you made in line 4 - edit and amend. I have shell aliases for amend, rbs (git rebase -i) and rbc (git rebase --continue), and almost every commit goes through at least 3 amends (usually one because I missed something, one for that typo, one for commit log message editing). Importantly: it doesn't matter how often you amend. Really. This is local only, no-one cares. The important thing is that you get to a good patch set, not that you get there with one commit.

git commit --amend only modifies the last commit, to go back and edit the past, you need to rebase. So, you need to

Learn how to rebase

Not just the normal git rebase, the tutorials cover that. Make sure you know how to use git rebase --interactive. Make sure you know how to change the ordering of a commit, how to delete commits, how to abort a rebase. Make sure you know how to squash two commits together and what the difference is between squash and fixup. I'm not going to write a tutorial on that, because you can find the documentation is easy enough to find. Simply take this as a hint that the time you spend learning how to rebase pays off. Also, you may find git squash interesting.

And remember: even if a rebase goes bad, the previous state is still in the reflog. Which brings me to:

Learn how to use the reflog

The git reflog is the list of changes in reverse chronological order of how they were applied to the repository, regardless what branch you're on. So HEAD@{0} is always "whatever we have now", HEAD@{1} is always "the repository before the last command". This doesn't just mean commits, it remembers any change. So if you switch from branch A to branch B, commit something, then switch to branch C, HEAD@{3} is A. git reflog helpfully annotates everything with the type, so you know what actually happened. So for example, if you accidentally dropped a patch during a rebase, you can look at the reflog, figure out when the rebase started. Then you either reset to that commit, or you just tig it and cherry-pick the missing commits back onto the current branch. Create yourself up with a test git repository and learn how to do exactly that now, it'll save you some time in the future.

Note that the reflog is local only. And remember, if it hasn't been committed, it's not in the reflog.

Use a git push hook

Repeat after me: echo make > .git/hooks/pre-push. And no more embarrassment for pushing patches that don't compile. I've made that mistake too many times, so now I even use my own git patch-set command that will run a hook for me when I'm generating a patch set to send to a list. You might want to make the hooks executable btw.

March 07, 2014
a quick PSA:

For those using my prebuilt freedreno binaries for fedora, there is now a much better way.  Nicolas Chauvet has created a repo w/ latest mesa which will work with freedreno:

http://blog.kwizart.fr/post/2014/03/02/163-mesa-10.2-from-git-for-Fedora-20

Big thanks Nicolas!

Brief Glamor Hacks

Eric Anholt started writing Glamor a few years ago. The goal was to provide credible 2D acceleration based solely on the OpenGL API, in particular, to implement the X drawing primitives, both core and Render extension, without any GPU-specific code. When he started, the thinking was that fixed-function devices were still relevant, so that original code didn’t insist upon “modern” OpenGL features like GLSL shaders. That made the code less efficient and hard to write.

Glamor used to be a side-project within the X world; seen as something that really wasn’t very useful; something that any credible 2D driver would replace with custom highly-optimized GPU-specific code. Eric and I both hoped that Glamor would turn into something credible and that we’d be able to eliminate all of the horror-show GPU-specific code in every driver for drawing X text, rectangles and composited images. That hadn’t happened though, until now.

Fast forward to the last six months. Eric has spent a bunch of time cleaning up Glamor internals, and in fact he’s had it merged into the core X server for version 1.16 which will be coming up this July. Within the Glamor code base, he’s been cleaning some internal structures up and making life more tolerable for Glamor developers.

Using libepoxy

A big part of the cleanup was a transition all of the extension function calls to use his other new project, libepoxy, which provides a sane, consistent and performant API to OpenGL extensions for Linux, Mac OS and Windows. That library is awesome, and you should use it for everything you do with OpenGL because not using it is like pounding nails into your head. Or eating non-tasty pie.

Using VBOs in Glamor

One thing he recently cleaned up was how to deal with VBOs during X operations. VBOs are absolutely essential to modern OpenGL applications; they’re really the only way to efficiently pass vertex data from application to the GPU. And, every other mechanism is deprecated by the ARB as not a part of the blessed ‘core context’.

Glamor provides a simple way of getting some VBO space, dumping data into it, and then using it through two wrapping calls which you use along with glVertexAttribPointer as follows:

pointer = glamor_get_vbo_space(screen, size, &offset);
glVertexAttribPointer(attribute_location, count, type,
              GL_FALSE, stride, offset);
memcpy(pointer, data, size);
glamor_put_vbo_space(screen);

glamor_get_vbo_space allocates the specified amount of VBO space and returns a pointer to that along with an ‘offset’, which is suitable to pass to glVertexAttribPointer. You dump your data into the returned pointer, call glamor_put_vbo_space and you’re all done.

Actually Drawing Stuff

At the same time, Eric has been optimizing some of the existing rendering code. But, all of it is still frankly terrible. Our dream of credible 2D graphics through OpenGL just wasn’t being realized at all.

On Monday, I decided that I should go play in Glamor for a few days, both to hack up some simple rendering paths and to familiarize myself with the insides of Glamor as I’m getting asked to review piles of patches for it, and not understanding a code base is a great way to help introduce serious bugs during review.

I started with the core text operations. Not because they’re particularly relevant these days as most applications draw text with the Render extension to provide better looking results, but instead because they’re often one of the hardest things to do efficiently with a heavy weight GPU interface, and OpenGL can be amazingly heavy weight if you let it.

Eric spent a bunch of time optimizing the existing text code to try and make it faster, but at the bottom, it actually draws each lit pixel as a tiny GL_POINT object by sending a separate x/y vertex value to the GPU (using the above VBO interface). This code walks the array of bits in the font and checking each one to see if it is lit, then checking if the lit pixel is within the clip region and only then adding the coordinates of the lit pixel to the VBO. The amazing thing is that even with all of this CPU and GPU work, the venerable 6x13 font is drawn at an astonishing 3.2 million glyphs per second. Of course, pure software draws text at 9.3 million glyphs per second.

I suspected that a more efficient implementation might be able to draw text a bit faster, so I decided to just start from scratch with a new GL-based core X text drawing function. The plan was pretty simple:

  1. Dump all glyphs in the font into a texture. Store them in 1bpp format to minimize memory consumption.

  2. Place raw (integer) glyph coordinates into the VBO. Place four coordinates for each and draw a GL_QUAD for each glyph.

  3. Transform the glyph coordinates into the usual GL range (-1..1) in the vertex shader.

  4. Fetch a suitable byte from the glyph texture, extract a single bit and then either draw a solid color or discard the fragment.

This makes the X server code surprisingly simple; it computes integer coordinates for the glyph destination and glyph image source and writes those to the VBO. When all of the glyphs are in the VBO, it just calls glDrawArrays(GL_QUADS, 0, 4 * count). The results were “encouraging”:

1: fb-text.perf
2: glamor-text.perf
3: keith-text.perf

       1                 2                           3                 Operation
------------   -------------------------   -------------------------   -------------------------
   9300000.0      3160000.0 (     0.340)     18000000.0 (     1.935)   Char in 80-char line (6x13) 
   8700000.0      2850000.0 (     0.328)     16500000.0 (     1.897)   Char in 70-char line (8x13) 
   6560000.0      2380000.0 (     0.363)     11900000.0 (     1.814)   Char in 60-char line (9x15) 
   2150000.0       700000.0 (     0.326)      7710000.0 (     3.586)   Char16 in 40-char line (k14) 
    894000.0       283000.0 (     0.317)      4500000.0 (     5.034)   Char16 in 23-char line (k24) 
   9170000.0      4400000.0 (     0.480)     17300000.0 (     1.887)   Char in 80-char line (TR 10) 
   3080000.0      1090000.0 (     0.354)      7810000.0 (     2.536)   Char in 30-char line (TR 24) 
   6690000.0      2640000.0 (     0.395)      5180000.0 (     0.774)   Char in 20/40/20 line (6x13, TR 10) 
   1160000.0       351000.0 (     0.303)      2080000.0 (     1.793)   Char16 in 7/14/7 line (k14, k24) 
   8310000.0      2880000.0 (     0.347)     15600000.0 (     1.877)   Char in 80-char image line (6x13) 
   7510000.0      2550000.0 (     0.340)     12900000.0 (     1.718)   Char in 70-char image line (8x13) 
   5650000.0      2090000.0 (     0.370)     11400000.0 (     2.018)   Char in 60-char image line (9x15) 
   2000000.0       670000.0 (     0.335)      7780000.0 (     3.890)   Char16 in 40-char image line (k14) 
    823000.0       270000.0 (     0.328)      4470000.0 (     5.431)   Char16 in 23-char image line (k24) 
   8500000.0      3710000.0 (     0.436)      8250000.0 (     0.971)   Char in 80-char image line (TR 10) 
   2620000.0       983000.0 (     0.375)      3650000.0 (     1.393)   Char in 30-char image line (TR 24)

This is our old friend x11perfcomp, but slightly adjusted for a modern reality where you really do end up drawing billions of objects (hence the wider columns). This table lists the performance for drawing a range of different fonts in both poly text and image text variants. The first column is for Xephyr using software (fb) rendering, the second is for the existing Glamor GL_POINT based code and the third is the latest GL_QUAD based code.

As you can see, drawing points for every lit pixel in a glyph is surprisingly fast, but only about 1/3 the speed of software for essentially any size glyph. By minimizing the use of the CPU and pushing piles of work into the GPU, we manage to increase the speed of most of the operations, with larger glyphs improving significantly more than smaller glyphs.

Now, you ask how much code this involved. And, I can truthfully say that it was a very small amount to write:

 Makefile.am        |    2 
 glamor.c           |    5 
 glamor_core.c      |    8 
 glamor_font.c      |  181 ++++++++++++++++++++
 glamor_font.h      |   50 +++++
 glamor_priv.h      |   26 ++
 glamor_text.c      |  472 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 glamor_transform.c |    2 
 8 files changed, 741 insertions(+), 5 deletions(-)

Let’s Start At The Very Beginning

The results of optimizing text encouraged me to start at the top of x11perf and see what progress I could make. In particular, looking at the current Glamor code, I noticed that it did all of the vertex transformation with the CPU. That makes no sense at all for any GPU built in the last decade; they’ve got massive numbers of transistors dedicated to performing precisely this kind of operation. So, I decided to see what I could do with PolyPoint.

PolyPoint is absolutely brutal on any GPU; you have to pass it two coordinates for each pixel, and so the very best you can do is send it 32 bits, or precisely the same amount of data needed to actually draw a pixel on the frame buffer. With this in mind, one expects that about the best you can do compared with software is tie. Of course, the CPU version is actually computing an address and clipping, but those are all easily buried in the cost of actually storing a pixel.

In any case, the results of this little exercise are pretty close to a tie — the CPU draws 190,000,000 dots per second and the GPU draws 189,000,000 dots per second. Looking at the vertex and fragment shaders generated by the compiler, it’s clear that there’s room for improvement.

The fragment shader is simply pulling the constant pixel color from a uniform and assigning it to the fragment color in this the simplest of all possible shaders:

uniform vec4 color;
void main()
{
       gl_FragColor = color;
};

This generates five instructions:

Native code for point fragment shader 7 (SIMD8 dispatch):
   START B0
   FB write target 0
0x00000000: mov(8)          g113<1>F        g2<0,1,0>F                      { align1 WE_normal 1Q };
0x00000010: mov(8)          g114<1>F        g2.1<0,1,0>F                    { align1 WE_normal 1Q };
0x00000020: mov(8)          g115<1>F        g2.2<0,1,0>F                    { align1 WE_normal 1Q };
0x00000030: mov(8)          g116<1>F        g2.3<0,1,0>F                    { align1 WE_normal 1Q };
0x00000040: sendc(8)        null            g113<8,8,1>F
                render ( RT write, 0, 4, 12) mlen 4 rlen 0      { align1 WE_normal 1Q EOT };
   END B0

As this pattern is actually pretty common, it turns out there’s a single instruction that can replace all four of the moves. That should actually make a significant difference in the run time of this shader, and this shader runs once for every single pixel.

The vertex shader has some similar optimization opportunities, but it only runs once for every 8 pixels — with the SIMD format flipped around, the vertex shader can compute 8 vertices in parallel, so it ends up executing 8 times less often. It’s got some redundant moves, which could be optimized by improving the copy propagation analysis code in the compiler.

Of course, improving the compiler to make these cases run faster will probably make a lot of other applications run faster too, so it’s probably worth doing at some point.

Again, the amount of code necessary to add this path was tiny:

 Makefile.am        |    1 
 glamor.c           |    2 
 glamor_polyops.c   |  116 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 glamor_priv.h      |    8 +++
 glamor_transform.c |  118 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 glamor_transform.h |   51 ++++++++++++++++++++++
 6 files changed, 292 insertions(+), 4 deletions(-)

Discussion of Results

These two cases, text and points, are probably the hardest operations to accelerate with a GPU and yet a small amount of OpenGL code was able to meet or beat software easily. The advantage of this work over traditional GPU 2D acceleration implementations should be pretty clear — this same code should work well on any GPU which offers a reasonable OpenGL implementation. That means everyone shares the benefits of this code, and everyone can contribute to making 2D operations faster.

All of these measurements were actually done using Xephyr, which offers a testing environment unlike any I’ve ever had — build and test hardware acceleration code within a nested X server, debugging it in a windowed environment on a single machine. Here’s how I’m running it:

$ ./Xephyr  -glamor :1 -schedMax 2000 -screen 1024x768 -retro

The one bit of magic here is the ‘-schedMax 2000’ flag, which causes Xephyr to update the screen less often when applications are very busy and serves to reduce the overhead of screen updates while running x11perf.

Future Work

Having managed to accelerate 17 of the 392 operations in x11perf, it’s pretty clear that I could spend a bunch of time just stepping through each of the remaining ones and working on them. Before doing that, we want to try and work out some general principles about how to handle core X fill styles. Moving all of the stipple and tile computation to the GPU will help reduce the amount of code necessary to fill rectangles and spans, along with improving performance, assuming the above exercise generalizes to other primitives.

Getting and Testing the Code

Most of the changes here are from Eric’s glamor-server branch:

git://people.freedesktop.org/~anholt/xserver glamor-server

The two patches shown above, along with a pair of simple clean up patches that I’ve written this week are available here:

git://people.freedesktop.org/~keithp/xserver glamor-server

Of course, as this now uses libepoxy, you’ll need to fetch, build and install that before trying to compile this X server.

Because you can try all of this out in Xephyr, it’s easy to download and build this X server and then run it right on top of your current driver stack inside of X. I’d really like to hear from people with Radeon or nVidia hardware to know whether the code works, and how it compares with fb on the same machine, which you get when you elide the ‘-glamor’ argument from the example Xephyr command line above.

March 05, 2014
Like most, I was taken by surprise last weekend, I never expected this to happen, at all.

Given that I was the one to poke a rather big hole in the Raspberry Pi image a year and a half ago, and since i have been playing a bit of a role in freeing ARM GPUs these last few years, I thought I'd say a few words on how I see things, and why I said what I said back then, and why I didn't say other things.

It has become a bit of a long read, but this it's a eventful story that spans about two years.

Getting acquainted.

I talked about the lima driver at linuxtag in may 2012. As my contract work wasn't going any where real, I spent two weeks of codethink time (well, it's not as if I ever did spent just 8h per day working on lima) on getting the next step working with lima, so I had something nice to demo at Linuxtag. During that bringup, I got asked to form a plan of attack for freeing the Raspberry Pi GPU, for a big potential customer. Just after my lima talk at the conference, I downloaded some code and images through the linuxtag wifi, and set about finding the shader compiler on the train on the way back.

I am not one for big micro-managed teams where constant communication is required. I am very big on structure and on finding natural boundaries (which is why you have the term modesetting and the structured approach to developing display drivers today). I tend to split up a massive task along those natural boundaries, and then have clued folks take full control and responsibility of those big tough subtasks. This is why I split the massive task of REing a GPU between the command stream side and the shader compiler. This is the plan of attack that I had formed in the first half of 2011 for Mali (which we are still sticking to today), and I was of course looking to repeat that for the RPi.

On the train, I disassembled glCompileShader from the VideoCore userspace binary, and found the code to be doing surprisingly little. It quickly went down to some routines which were used by pretty much all gl and egl calls. And those lower routines were basically taking in some ids and some data, and then passing it to the kernel. A look at the kernel code then told me that there was nothing but passing data through happening at that side...

There was nothing real going on, as all the nifty bits were running somewhere else.

In the next few days, I talked to some advanced RPi users on IRC, and we figured out that even the bootloader runs on the Videocore and that that brings up the ARM core. A look at the 3 different binary blobs, for 3 different memory divisions between the ARM and the VideoCore, revealed that they were raw blobs, running directly on the videocore. Strings were very clearly visible, and what was immediately clear is that it was mostly differences in addresses between the 3 blobs. Apart from the addresses, nothing was immediately apparent, apart from the fact that even display code was done by the Videocore. I confirmed with Ben Brewer, who was then working on the shaders for Lima, and who has a processor design background.

The news therefor wasn't good. The Raspberry Pi is a completely closed system with a massive black box pulling all the strings. But at least it is consistently closed, and there is only limited scope for making userspace and 'The Blob' getting out of sync, unlike the PowerVR, where userspace, kernel and microcode are almost impossible to keep in sync.

So I wrote up the proposal for the potential customer (you know who you are), basically a few manmonths of busywork (my optimistic estimate doubled, and then doubled again, as I never do manage to keep my own optimistic timelines) to free the userspace shim, and a potential month for Ben for poking at the videocore blobs. Of course, I never heard back from codethink sales, which was expected with news that was this bad.

I decided to stay silent about the Raspberry Pi being such a closed system, at least for a few weeks. For once, this customer was trying to do The Right Thing by talking to the right people, instead of just making big statements, no matter what damage they cause or what morals they breach. I ended up being quite busy in the next few months and I kind of forgot about making some noise. This came back to haunt me later on.

The noise.

So on oktober 24th of 2012, the big announcement came. Here is the excerpt that I find striking:
"[...] the Raspberry Pi is the first ARM-based multimedia SoC with
fully-functional, vendor-provided (as opposed to partial, reverse engineered)
fully open-source drivers, and that Broadcom is the first vendor to open their
mobile GPU drivers up in this way. We at the Raspberry Pi Foundation hope to
see others follow."
While the text of the announcement judiciously talks about "userland" and "VideoCore driver code which runs on the ARM", it all goes massively downhill with the above excerpt. Together wit the massive cheering coming from the seemingly very religious RPi community, the fact that the RPis SoC was a massively closed system was completely drowned out.

Sure, this release was a very good step in the right direction. It allowed people to port and properly maintain the RPi userspace drivers, and took away a lot of the overhead for integration. But the above statement very deliberately brushed the truth about the BCM2835 SoC under the carpet, and then went on to brashly call for other vendors to follow suit, even though those others had chosen very different paths in their system design, and did not depend on a massive all-controlling black box.

I was appalled. Why was this message so deliberately crafted in what should be a very open project? How could people be so shortsighted and not see the truth, even with the sourcecode available?

What was even worse was the secondary message here. To me, it stated "we are perfectly happy with the big binary blob pulling the strings in the background". And with their call to other vendors, they pretty much told them: "keep the real work closed, but create an artificial shim to aid porting!". Not good.

So I went and talked to the other guys on the #lima channel on irc. The feeling amongst the folks in our channel was pretty uniform one of disgust. At first glance, this post looked as a clear and loud message to support open drivers, it only looked like that on the absolute surface. Anyone who would dig slightly deeper would soon see the truth, and the message would turn into "Keep the blob guys!".

It was clear that this message from the RPi foundation was not helping the cause of freeing ARM GPUs at all. So I decided to react to this announcement.

The shouting

So I decided to go post in the broadcom forums, and this is what I wrote:
Erm… Isn’t all the magic in the videocore blob? Isn’t the userland code you
just made available the simple shim that passes messages and data back and forth
to the huge blob running on the highly proprietary videocore dsp?

– the developer of the lima driver.
Given the specific way in which the original RPI foundation announcement was written, i had expected all from the RPI foundation itself to know quite clearly that this was indeed just the shim driver. The reminder of me being the lima driver developer, should've taken away any doubt about me actually knowing what i was talking about. But sadly, these assumptions were proven quite wrong when Liz Upton replied:
No.

There’s some microcode in the Videocore – not something you should confuse with an
ARM-side blob, which could actually prevent you from understanding or modifying
anything that your computer does. That microcode is effectively firmware.
She really had no idea to what extent the videocore was running the show, and then happily went and argued with someone whom most people would label an industry expert. Quite astounding.

Things of course went further downhill from there, with several RPI users showing themselves from their best side. Although I did like the comment about ATI and ATOMBIOS, some people at least hadn't forgotten that nasty story.

All-in-all, this was truly astounding stuff, and it didn't help my feeling that the RPI foundation was not really intending to do The Right Thing, but instead was more about making noise, under the guise of marketing. None of this was helping ARM GPUs or ARM userspace blobs in any way.

More shouting

What happened next was also bad. Instead of sticking to the facts, Dave Airlie went and posted this blog entry, reducing himself to the level of Miss Upton.

A few excerpts:
"Why is this not like other firmware (AMD/NVIDIA etc)?
The firmware we ship on AMD and nvidia via nouveau isn't directly controlling the GPU
shader cores. It mostly does ancillary tasks like power management and CPU offloading.
There are some firmwares for video decoding that would start to fall into the same
category as this stuff. So if you want to add a new 3D feature to the AMD gpu driver
you can just write code to do it, not so with the rPI driver stack."
"Will this mean the broadcom kernel driver will get merged?
No.

This is like Ethernet cards with TCP offload, where the full TCP/IP stack is run on the
Ethernet card firmware. These cards seem like a good plan until you find bugs in their
firmware stack or find out their TCP/IP implementation isn't actually any good. The same
problem will occur with this. I would take bets the GLES implementation sucks, because
they all do, but the whole point of open sourcing something is to allow other to improve
it something that can't be done in this case."
Amazing stuff.

First of all, with the videocore being responsible for booting the bcm2835, and the bcm2835 already being a supported platform in the linux kernel (as of version 3.7), all the damage had been done already. Refusing yet another tiny message passing driver, which has a massive user-base and which will see a lot of testing, then is rather short-sighted. Especially since this would have made it much easier for the massive RPI userbase to use the upstream kernel.

Then there is the fact that this supposed kernel driver would not be a drm driver. It's simply not up to the self-appointed drm maintainer to either accept or refuse this.

And to finish off, a few years ago, Dave forked RadeonHD driver code to create a driver that did things the ATI way (read: much less free, much more dependent on bios bytecode, and with many other corners being implemented the broken ATI way), wasted loads of AMD money, stopped register documentation from coming out, and generally made sure that fglrx still rules supreme today. And all of this for just "marketing reasons", namely the ability to make trump SuSE with noise. With a history like that, you just cannot go and claim the moral high ground on anything anymore.

Like I've stated before, open source software is about making noise, and Dave his blog entry was nothing more and nothing less. Noise breeds more noise in our parts. This is why I decided to not provide a full and proper analysis of the situation back then, as we had lowered ourselves to the level that the RPI foundation had chosen, and nothing I could have said would've improved on that.

Redemption

I received an email from Eben Upton last friday, where he pointed out the big news. I replied that I would have to take a thorough look first, but then also stated that I didn't expect Eben to email me if this hadn't been the real thing. I then congratulated him extensively.

And congratulations really are in order here.

The Raspberry PI is a global sales phenomenon. It created a whole new market, a space where devices like the Cubieboard and Olimex boards also are in now. It made ARM development boards accessible and affordable, and put a linux on ARM in the hands of 2.5 million people. It is so big that the original goal of the Raspberry Pi, namely that of education, is kind of drowned out. But just the fact that so many of these devices were sold already, will have a bigger educational impact than any direct action towards that goal. This gives the RPI foundation a massive amount of power, so much so, that even the most closed SoC around is being opened now.

But in Oktober of 2012, my understanding was that the RPI foundation wasn't interested in wielding that power to reach those goals which I and many others hold dear. And I was really surprised when it now was revealed that they did, as I had personally given up on this ever happening, and I had assumed that the RPI foundation was not intending to go all the way.

What's there?

After the email from Eben, I joined the #raspberrypi-internals channel on freenode to ask the guys who are REing the VideoCore, to get an expert opinion from those actually working on the hardware.

So what has appeared:
* A user manual for the 3D core.
* Some sample code for a different SoC, but code which which runs on the ARM core and not on the videocore.
* This sample code includes some interesting headers which helps the guys who are REing the Videocore.
* There's a big promise of releasing more about the Videocore and at least providing a properly free bootloader.

These GPU docs and code not only means that Broadcom has beaten ARM, Qualcomm, Nvidia, Vivante and Imagination technologies to the post, it also makes broadcom second only to Intel. (yes, in front of AMD, or should I say ATI, as ATI has run most of the show since 2008).

Then there is the extra info on the videocore and the promise of a fully free bootloader. Yes, there still is the massive dependance on the VideoCore for doing all sorts of important SoC things, but releasing this information takes time, especially since this was not part of the original SoC release process and it has to be made up for afterwards. The future is looking bright for this SoC though, and we really might get to a point where the heavily VideoCore based BCM2835 SoC becomes one of the most open SoCs around.

Today though, the BCM2835 still doesn't hold a candle to the likes of the Allwinner A10 with regards to openness. But the mid to long term difference is that most of Allwinners openness so far was accidental (they are just now starting to directly cooperate with the linux-sunxi community), and Allwinner doesn't own all the IP used in their SoC (ARM owns the GPU for instance, and it is being a big old embedded dinosaur about it too). The RPI SoC not only has very active backing from the SoC vendor, that SoC vendor also happens to be the full owner of the IP at the same time.

The RPI future looks really bright!

Whining.

I do not like the 10k bounty on getting Q3A going with the freshly released code. I do not think this is the best way to get a solid gpu driver out, as it mostly encourages a single person to make a quick and dirty port of the freshly released code. I would've preferred to have seen someone from the RPI foundation create a kanban with clearly defined goals, and then let the community work off the different work items, after which those work-items are graded and rewarded. But then, it's the RPI foundations money, and this bounty is a positive action overall.

I am more worried about something else. With the bounty and with the promise of a free bootloader, the crazy guys who are REing the VideoCore get demotivated twice. This 10k bounty rules them out, as it calls for someone to do a quick and dirty hack, and not for long hard labour towards The Right Thing. The promise of a free bootloader might make their current work superfluous, and has the potential to halt REing work altogether. These guys are working on something which is or was at least an order of magnitude harder than what the normal GPU REing projects have or had to deal with. I truly hope that they get some direct encouragement and support from the RPI foundation, and that both parties work together on getting the rest of the VideoCore free.

All in all, this Broadcom release is the best news I have heard since I started on the open ARM GPU path, it actually is the best news I've had since AMD accepted our proposal to free the ATI Radeon (back in june 2007). This release fully makes up for the bad communication of Oktober 2012, and has opened the door to making the BCM2835 the most free SoC out there.

So to Eben and the rest of the guys from the RPI Foundation: Well done! You made us all very happy and proud.
March 03, 2014

Hi, so I wrote A blog entry asking people to contribute to the PiTiVi fundraiser effort last week. I am happy to see them having already reached 10K Euro, but their goal of course is to get more. I am also happy to say that the GStreamer Project decided to support the fundraiser using some of the money we earned over the years doing Google Summer of Code and organizing the GStreamer Conference, which added 2500€ to the effort. You can read about the GStreamer donation here.

Anyway, I would once again want to ask people to contribute to this effort. There is a proven team behind this fundraiser, so this is not like a kickstarter where people are starting from scratch and you have no idea where they will end up going or what they might achieve. This is an existing project that just needs some polish to get it to critical mass. So visit the fundraising page and make your pledge.

March 02, 2014

Oort is an experimental programming language I have been working on, on and off (mostly off), since 2007. It is a statically typed, object-oriented, imperative language, where classes, functions and methods can be nested arbitrarily, and where functions and methods are full closures, ie., they can be stored in variables and returned from functions. The control structures are the usual ones: if, for, while, do, goto, etc.

It also has an unusual feature: goto labels are first class.

What does it mean for labels to be first class? It means two things: (1) they are lexically scoped so that they are visible from inside nested functions. This makes it possible to jump from any point in the program to any other location that is visible from that point, even if that location is in another function. And (2) labels can be used as values: They can be passed to and returned from functions and methods, and they can be stored in data structures.

As a simple example, consider a data structure with a “foreach” method that takes a callback function and calls it for every item in the data structure. In Oort this might look like this:

table: array[person_t];

table.foreach (fn (p: person_t) -> void {
        print p.name;
        print p.age;
    });

A note about syntax. In Oort, anonymous functions are defined like this:

fn (<arguments>) -> <return type> {
    ...;
}

and variables and arguments are declared like this:

<name>: <type>

so the code above defines an anonymous function that prints the name and the age of person and passes that function to the foreach method of the table.

What if we want to stop the iteration? You could have the callback return true to stop, or you could have it throw an exception. However, both methods are a little clumsy: The first because the return value might be useful for other purposes, the second because stopping the iteration isn’t really an exceptional situation.

With lexically scoped labels there is a direct solution – just use goto to jump out of the callback:

  table.foreach (fn (p: person_t) -> void {
          print p.name;
          print p.age;

          if (p.age > 50)
              goto done;
      });

@done:

Note what’s going on here: Once we find a person older than 50, we jump out of the anonymous callback and back into the enclosing function. The git tree has a running example.

Call/cc in terms of goto
In Scheme and some other languages there is a feature called call/cc, which is famous for being both powerful and mind-bending. What it does is, it takes the concept of “where we are in the program” and packages it up as a function. This function, called the continuation, is then passed to another, user-defined, function. If the user-defined function calls the continuation, the program will resume from the point where call/cc was invoked. The mind-bending part is that a continuation can be stored in data structures and called multiple times, which means the call/cc invocation can in effect return more than once.

Lexically scoped labels are at least as expressive as call/cc, because if you have them, you can write call/cc as a function:

call_cc (callback: fn (k: fn()->void)) -> void
{
    callback (fn() -> void { 
        goto current_continuation;
    });

@current_continuation:
}

Let’s see what’s going on here. A function called call_cc() is defined:

call_cc (...) -> void
{
}

This function takes another function as argument:

callback: fn (...) -> void

And that function takes the continuation as an argument:

k: fn()->void

The body of call/cc calls the callback:

callback (...);

passing an anonymous function (the continuation):

    fn() -> void {
        goto current_continuation;
    }

@current_continuation:

that just jumps to the point where call_cc returns. So when callback decides to invoke the continuation, execution will resume at the point where call_cc was invoked. Since there is nothing stopping callback from storing the continuation in a data structure or from invoking it multiple times, we have the full call/cc semantics.

Cooperative thread system
One of the examples on the Wikipedia page about call/cc is a cooperative thread system. With the call_cc function above, we could directly translate the Wikipedia code into Oort, but using the second aspect of the first-class-ness of labels – that they can be stored directly in data structures – makes it possible to write a more straightforward version:

run_list: list[label] = new list[label]();

thread_fork (child: fn() -> void)
{
    run_list.append (me);
    child();
    goto run_list.pop_head();
@me:
}

thread_yield()
{
    run_list.append (me);
    goto run_list.pop_head ();
@me:
}

thread_exit()
{
    if (!run_list.is_empty())
        goto run_list.pop_head();
    else
        process_exit();
}

The run_list variable is a list of labels containing the current positions of all the active threads. The keyword label in Oort is simply a type specifier similar to string.

To create a new thread, thread_fork first saves the position of the current thread on the list, and then it calls the child function. Similarly, thread_yield yields to another thread by saving the position of the current thread and jumping to the first label on the list. Exiting a thread consists of jumping to the first thread if there is one, and exiting the process if there isn’t.

The code above doesn’t actually run because the current Oort implementation doesn’t support genericity, but here is a somewhat uglier version that actually runs, while still demonstrating the principle.

February 25, 2014
As you've probably seen in my previous post, the new Videos UI has part of its interface focused on various channels from online sources, such as the Blip.tv, or videos from the Guardian.

Grilo recently grew support for Lua sources, which means you can write about 100 lines of lua, and integrate videos from an online source into Videos easily.

The support isn't restricted to videos, GNOME Music and GNOME Photos and a number of other applications will also be able to use this to be extended.

Small tutorial by example

Our example is that of a source that would fetch the list of Ogg Theora streams from Xiph.org's streaming directory.

First, define the "source": the name is what will show up in the interface, supported_keys lists the metadata fields that we'll be filling in for each media item, and supported_media mentions we only show videos, so the source can be skipped in music players.


source = { 
id = 'grl-xiph-example',
name = 'Xiph Example',
supported_keys = { 'id', 'title', 'url', 'type' },
supported_media = 'video',
}

We'll then implement one of the source methods, to browse/list items in that source. First, we cheat a bit and tell the caller that we don't have any more items if you need to skip some. This is usual for sources with few items as the front-end is unlikely to list items 2 by 2. If that's not the case, we fetch the page on the Xiph website and wait for the callback in fetch_cb


function grl_source_browse(media_id)
if grl.get_options("skip") > 0 then
grl.callback()
else
grl.fetch('http://dir.xiph.org/by_format/Ogg_Theora', 'fetch_cb')
end
end

Here's the meat of the script, where we parse the web page into media items. Lua doesn't use regular expressions, but patterns. They're different, and I find them easier to grasp. Remember that the minus sign/dash is a reserved character, '%' is the escape character, and '()' enclose the match.

We create a new table called for each one of the streams in the HTML we scraped, with the metadata we said we'd give in the source definition, and send it back to Grilo. The '-1' there is the number of items remaining in the list.

Finally, we call grl.callback() without any arguments to tell it we're done fetching the items.


function fetch_cb(results)
if not results then
grl.callback()
end

for stream in results:gmatch('<p class="stream%-name">(.-)</p>') do
media = {}
media.url = stream:match('href="(.-)" ')
media.id = media.url
media['type'] = 'video'
media.title = stream:match('<a href.->(.-)</a>')

grl.callback(media, -1)
end

grl.callback()
end

We're done! You just need to drop this file in ~/.local/share/grilo-plugins/grl-lua-factory, and you can launch Videos or the test application grilo-test-ui-0.2 to see your source in action.



Why Lua?

This screen scraping is what Lua is good at, with its powerful yet simple pattern matching. Lua is also easily embeddable, with very few built-in functions which means we can have better control over the API plugins use, a low foot-print, and all the benefits of an interpreted garbage-collected language.

I hear heckles of "Javascript" in the background, so I guess I better address those as well. I think Lua's pattern matching is better than Javascript regexes, and more importantly, Lua is more easily embeddable in big applications, because of its simplicity as a language and a VM. Basically, Javascript (and the gjs implementation we'd likely have used in particular) is too powerful for our use case.

Better sources

It's obviously possible to avoid this screen scraping when the online source provides data in an easily parseable format (such as Json for which we have Lua bindings). That will be the case of the Guardian videos source (once we've figured out a minor niggle with the 50 items query limit) thanks to the Guardian's Open Data work.

Hopefully it means that we'll have sources for the Wiki Commons Picture of the day (as requested by GNOME alumni Luis Villa) for use in the Background settings, or for Mediagoblin installations.

Videos sidenote

An aside, for those of you who have videos on a separate network storage, or not indexed by Tracker, there's a hidden configuration to show specific paths in the Channels tab in Videos.

gsettings set org.gnome.totem filesystem-paths "['smb://myserver.local/videos/']"

Epilogue

I'm looking forward to seeing more Grilo sources. I feel that this Lua source lowers the barrier to entry, enabling the evening hacker to integrate their favourite media source into GNOME, which probably means that we'll need to think about parental controls soon! ;)

Thanks to Victor Toso for his work on Lua sources, both during and after the Summer of Code, and Juan Suarez for his mentoring and patch reviewing.
February 24, 2014

So the PiTiVi announced the PiTiVi fundraising campaign on Friday. I sincerely hope they are successful, because I think we really need a good non-linear video editor that runs on the Linux desktop, especially one that is built on top of GStreamer and thus sharing the core multimedia infrastructure of the rest of your desktop. The current PiTiVi team got the right skills and enthusiams in my view to truly pull this off, and their project is scoped in a manner that makes me believe they can pull it off. PiTiVi is already functional and this fundraiser is more about accelerating ongoing development as opposed to creating something from scratch. And their funding requirements for reaching the base milestone is rather modest, for example if just the employees of the 3 main linux distribution companies pitched in 3-4 Euro it would be enough to cover the base funding goal.

But I think this fundraiser is important also beyond the PiTiVi project, because it can serve as a precedent that it is possible to do significant crowdfunding around open source development and thus open the gate for more projects accelerating their development using it. There are a lot of great open source projects out there created on a volunteer basis, which is great and like PiTiVi they will flourish even without crowdfunding, but crowdfunding can be a great way for developers of the most interesting projects to be able to focus solely on their project for some time and thus accelerate its development significantly. So in the case of PiTiVi, I am sure the team will be able to achieve all the goals they have outlined in the funding campaign even if the fundraiser raises no money, but the difference here is if they do it in 1 year or 5 years.

So personally I donated 60 Euro to the PiTiVi fundraiser and I hope everyone reading this blog entry will do the same. Lets give the people developing this and other great open source tools our support and help them make their great software even better. This fundraiser is done by people passionate about open source and their project, because to be fair, no matter if the efforts ends up raising closer to 30 000€ or closer to 100 000 € it is in no way what anyone can call a get rich quick scheme, but rather modest amounts that will let two talented open source developers spend time working fulltime on a project we all want.

And remember whenever a major project using GStreamer gets a boost, it gives all GStreamer projects a boost. For instance in my own pet project, Transmageddon, I am gotten a lot of help over the years from general improvements in GStreamer done due to the involvement from PiTiVi developers, and I have even ended up copying code from PiTiVi itself a few times to quickly and easily solve some challenges in had in Transmageddon.

There has been much discussion online this weekend of what happens when you introduce an extra “goto fail” line into your code, such as making a mistake using a merge tool to combine your changes into those made in parallel to a code repository under active development by other engineers. (That is the most common cause I’ve seen of such duplication in source code, and thus the one I see most likely — others guess similarly and discuss that side more in depth — but that’s not what this post is about.)

Adam Langley posted a good explanation of this bug, including a brief snippet of code showing the general class of problem:

 1     extern int f();
 2
 3     int g() {
 4             int ret = 1;
 5
 6             goto out;
 7             ret = f();
 8
 9     out:
10             return ret;
11     }
which he pointed out “If I compile with -Wall (enable all warnings), neither GCC 4.8.2 or Clang 3.3 from Xcode make a peep about the dead code.” Others pointed out that the -Wunreachable-code option is one of many that neither compiler includes in -Wall by default, since both compilers treat -Wall not as “all possible warnings” but rather more of “the subset of warnings that the compiler authors think are suitable for use on most software”, leaving out those that are experimental, unstable, controversial, or otherwise not quite ready for prime time.

Since this has a simple test case, and I’ve got a few compilers handy on my Solaris x86 machine for doing various builds with, I did some testing myself:


gcc 3.4.3

% /usr/gcc/3.4/bin/gcc -Wall -Wextra -c fail.c
[no warnings or other output]

% /usr/gcc/3.4/bin/gcc -Wall -Wextra -Wunreachable-code -c fail.c
fail.c: In function `g':
fail.c:7: warning: will never be executed

As expected, gcc warned when the specific flag was given.

gcc 4.5.2 & 4.7.3

% /usr/gcc/4.5/bin/gcc -Wall -Wextra -Wunreachable-code -c fail.c
[no warnings or other output]

% /usr/gcc/4.7/bin/gcc -Wall -Wextra -Wunreachable-code -c fail.c
[no warnings or other output]

It turns out this is due to the gcc maintainers removing the support for -Wunreachable-code in later versions.

clang 3.1

% clang -Wall -Wextra -c fail.c
[no warnings or other output]

% clang -Wall -Wunreachable-code -c fail.c
fail.c:7:8: warning: will never be executed [-Wunreachable-code]
        ret = f();
              ^
1 warning generated.

% clang -Wall -Weverything -c fail.c
fail.c:3:5: warning: no previous prototype for function 'g'
      [-Wmissing-prototypes]
int g() {
    ^
fail.c:7:8: warning: will never be executed [-Wunreachable-code]
        ret = f();
              ^
2 warnings generated.

So clang also finds it with -Wunreachable-code, which is also enabled by -Weverything, but not -Wall or -Wextra. (I didn’t see a similar -Weverything flag for gcc.)

Solaris Studio 12.3

% cc -c fail.c
"fail.c", line 7: warning: statement not reached

% lint -c fail.c

statement not reached
    (7)

Both Studio’s default compiler flags and old-school lint static analyzer caught this by default. Though, as I noted last year, warnings about some unreachable code chunks will be seen in some releases but not others, as the information available to the compiler about the control flow evolves.


As anyone who has compiled much software with more than one of the above (including just different versions of the same compiler) can attest, the warnings generated on real code bases by various compilers will almost always be different. There’s many cases where gcc & clang warn about things Studio doesn’t, and vice versa, as every compiler & analyzer has been fed different test cases over the years of bad code that developers wanted them to find, or specific problems they needed to solve. This is why when compiling the Solaris kernel, we run two compilation passes on all the sources - once with Studio cc to both get its code analysis and to make the binaries we ship, and once with gcc just to get its code analysis, plus running it through Oracle’s internal parfait static analyser as well.

Having a growing body of warnings doesn’t help either, if you never do anything about them. This is why Solaris kernel and core utility code have policies on not introducing new warnings and we’ve been making efforts in X.Org to clean up warnings, so that you can see new issues when you introduce them, not lose them in a sea of thousands of existing bits of noise.

Of course, it’s too late to stop the bug that caused this discussion from slipping into the code base, and easy to second guess after the fact how it may have been prevented. We have lots of tools at our disposal and they continue to grow and improve, including human code review (though human review is much less reliable and repeatable than most tools, humans are much more flexible at finding things the automated tools weren't prepared to catch), and we need to use multiple ones (more on the most sensitive code, such as security protocols) to improve our chances of finding and fixing the bugs before we integrate them. Hopefully we can learn from the ones that slip through how to improve our checks for the future, such as adding -Wunreachable-code compiler flags where appropriate.

February 19, 2014
The FOSDEM organizers have captured more than 16h of video for 22 DevRooms and they are still working hard on getting all of these videos cut and transcoded and put on their site. This is a mammoth task, and some talks in the graphics devroom are still up in the air because of this.

Luckily, some people from the sunxi community took the livestream of sunxi related talks, and produced some quick dumps of it. I have now copied the Mali driver talk over to my fd.o account so that the sunxi server doesn't become too overloaded. Slides are also available.

This talk mentions where i am at with my mesa driver, why i am taking the route i am taking, the importance of SoC communities, and how the lack thereof hampers driver development, and also talks about the company ARM and their stance towards the open driver.

Enjoy!
February 17, 2014

For a long time connecting your TV or other external monitors to your laptop has always been a hassle with cables and drivers. Your TV had an HDMI-port, but your laptop only DVI, your projector requires VGA and your MacBook can only do DP. Once you got a matching adapter, you noticed that the cable is far to short for you to sit comfortably on your couch. Fortunately, we’re about to put an end to this. At least, that was my impression when I first heard of Miracast.

Miracast is a certification program of the Wifi-Alliance based on their Wifi-Display specification. It defines a protocol to connect external monitors via wifi to your device. A rough description would be “HDMI over Wifi” and in fact the setup is modeled closely to the HDMI standard. Unlike existing video/audio-streaming protocols, Miracast was explicitly designed for this use-case and serves it very well. When I first heard of it about one year ago, I was quite amazed that it took us until late 2012 to come up with such a simple idea. And it didn’t take me long until I started looking into it.

For 4 months now I have been hacking on wifi, gstreamer and the kernel to get a working prototype. Despite many setbacks, I didn’t loose my fascination for this technology. I did complain a lot about the hundreds of kernel dead-locks, panics and crashes during development and the never-ending list of broken wifi-devices. However, to be fair I got the same amount of crashes and bugs on Android and competing proprietary Miracast products. I found a lot of other projects with the same goal, but the majority was closed-source, driver-specific, only a proof-of-concept or limited to sink-side. Nevertheless, I continued pushing code to the OpenWFD repository and registered as speaker for FOSDEM-2014 to present my results. Unfortunately, it took me until 1 week before FOSDEM to get the wifi-P2P link working realiably with an upstream driver. So during these 7 days I hacked up the protocol and streaming part and luckily got a working demo just in time. I never dared pushing that code to the public repository, though (among others it contains my birthday, which is surprisingly page-aligned, as magic mmap offset) and I doubt it will work on any setup but mine..

The first day after FOSDEM I trashed OpenWFD and decided to start all over again writing a properly integrated solution based on my past experience. Remembering the advice of a friend (“Projects with ‘Open’ in their name are just like countries named ‘Democratic Republic of …’”) I also looked for a new name. Given the fact that it requires a miracle to get Wifi-P2P working, it didn’t take me long to come up with something I liked. I hereby proudly announce: MiracleCast

MiracleCast

The core of MiracleCast is a daemon called miracled. It runs as system-daemon and manages local links, performs peer-discovery on request and handles protocol encoding and parsing. The daemon is independent of desktop environments or data transports and serves as local authority for all display-streaming protocols. While Miracast will stay the main target, other competing technologies like Chromecast, Airtame and AirPlay can be integrated quite easily.

The miraclectl command-line tool can be used to control the daemon. It can create new connections, modify parameters or destroy them again. It also supports an interactive mode where it displays events from miracled when they arrive, displays incoming connection attempts and allows direct user-input to accept or reject any requests.

miracled and miraclectl communicate via DBus so you can replace the command-line interface with graphical helpers. The main objects on this API are Links and Peers. Links describe local interfaces that are used to communicate with remote devices. This includes wifi-devices, but also virtual links like your local IP-Network. Peers are remote devices that were discovered on a local link. miracled hides the transport-type of each peer so you can use streaming protocols on-top of any available link-type (given the remote side supports the same). Therefore, we’re not limited to Wifi-P2P, but can use Ethernet, Bluetooth, AP-based Wifi and basically any other transport with an IP layer on top. This is especially handy for testing and development.

Local processes can now register Sources or Sinks with miracled. These describe capabilities that are advertised on the available links. Remote devices can then initiate a connection to your sink or, if you’re a source, you can connect to a remote sink. miracled implements the transmission protocol and hides it behind a DBus interface. It routes traffic from remote devices to the correct local process and vice versa. Regardless of whether the connection uses Miracast, Chromecast or plain RTSP, the same API is used.

Current Status

The main target still is Miracast! While all the APIs are independent of the protocol and transport layer, I want to get Miracast working as it is a nice way to test interoperability with Android. And for Miracast, we first need Wifi-P2P working. Therefore, that’s what I started with. The current miracled daemon implements a fully working Wifi-P2P user-space based on wpa_supplicant. Everyone interested is welcome to give it a try!

Everything on top, including the actual video-streaming is highly experimental and still hacked on. But if you hacked on Miracast, you know that the link-layer is the hard part. So after 1 week of vacation (that is, 1 week hacking on systemd!) I will return to MiracleCast and tackle local sinks.

As I have a lot more information than I could possible summarize here, I will try to keep some dummy-documentation on the wiki until I the first project release. Please feel free to contact me if you have any questions. Check out the git-repository if you want to see how it works! Note that MiracleCast is focused on proper desktop integration instead of fast prototyping, so please bear with me if API design takes some time. I’d really appreciate help on making Wifi-P2P work with as many devices as possible before we start spending all our efforts on the upper streaming layers.

During the next weeks, I will post some articles that explain step by step how to get Wifi-P2P working, how you can get some Miracast protoypes working and what competing technologies and implementations are available out there. I hope this way I can spread my fascination for Miracast and provide it to as many people as possible!


February 14, 2014

Unit testing is a very powerful approach to quickly find issues that may have been introduced in your software. I have always been very interested in unit testing, even more since my first professional job at Panda Security was implementing a whole new unit testing system for a desktop antivirus.

In ModemManager we had some unit testing support, but it was really only focused on making sure that our AT response parsers worked correctly. These tests are of course very important, not saying they aren’t, but they only cover a very small subset of the implementation.

Supporting completely different devices from multiple vendors with completely different behaviors, while still providing a unified interface for all of them is a very challenging task. In particular, making changes to fix the behavior of one device may lead to broken behavior with others, and the AT response parser unit tests really are not up to the task of finding those issues. So, in order to improve that, we have now included a nice black box system testing environment along with the common unit tests, which can not only simulate AT-based modems, but also let us play with the ModemManager DBus interface directly. And all this by just running make check!

Keep on reading if you want to know how all this was setup :)

simulator-step-1-bus

Every test using the new black box system testing will run a common test setup method before the actual test (and a common teardown method after having run it). This setup method will take care of launching a private DBus session using the excellent GTestDBus implementation in GLib/GIO. Once the new bus is ready to use, the test setup will also request a Ping() to the standard org.freedesktop.DBus.Peer interface in ModemManager, which in turn will make the daemon get started by DBus. Once the test fixture is setup, the test itself will be able to make use of the ModemManager process running in the private session, without any interference with the ModemManager that may already be running in the standard system-level bus.

simulator-step-2-thread

Each test will then setup the simulation of the modem, by launching a separate thread running a Unix-domain socket service. This socket will have an abstract address, and will allow other processes to get connected to send/receive data. In addition to the socket service, the simulator will also read a dictionary of AT requests and responses from a text file included in the test context. This pair of socket and AT chat build together the simulation of an AT port of a modem: AT requests will arrive through the socket, and the simulator will reply the corresponding AT response gathered from the AT chat dictionary. If the dictionary doesn’t have the response expected for a given request, it will just reply “ERROR”.

simulator-step-3-test-modem

But the private ModemManager instance doesn’t know anything about the new modem simulator being available, as there is no real physical port being created that ModemManager could get notified about via udev. So, once the simulator is in place, the test will report the abstract address of the socket via the SetProfile() method in the new org.freedesktop.ModemManager.Test interface. This method will just tell ModemManager that a given new virtual AT port is available in a given abstract socket address, and ModemManager will go on creating a new AT virtual port object internally and treat it as if it were a new TTY reported by udev.

simulator-step-4-detect-modem

As soon as ModemManager knows about the new virtual AT port, it will start probing it and creating a whole Modem object internally. Of course, for this to work well, the simulated modem needs to be able to reply correctly to most AT requests sent by ModemManager. The current implementation provides a generic AT chat dictionary for 3GPP (GSM, UMTS, HSPA, LTE…) devices, and the overall idea is that tests can load first the generic one to get a basic subset of responses, and then load a vendor-specific one with vendor-specific responses (overriding the generic ones if needed).

simulator-step-5-unit-test

And of course, the last step is the test itself. Once every previous step has been successfully executed, the tester can now play with the standard ModemManager interfaces in DBus, querying for modems, requesting PIN unlock, enabling or disabling the device, and so on.

The currently available setup is not fully featured yet, it just provides some basic building blocks to be able to extend the black box system tests further more. The real work comes now, adding new behavior tests and writing new AT chat dictionaries to simulate different devices. If you want to play yourself with this setup, get the latest code from git master!


Filed under: FreeDesktop Planet, GNOME Planet, GNU Planet, ModemManager, Planets, Uncategorized Tagged: AT, ModemManager, unit testing
February 13, 2014
So as usual FOSDEM was a blast and as usual the hallway track was great too. The beer certainly helped with that ... Big props to the entire conference crew for organizing a stellar event and specifically to Luc and the other graphics devroom people.

I've also uploaded the slides for my Kernel GPU Driver Testing talk. My OCD made me remove that additional dot and resurrect the missing 'e' in one of the slides even. FOSDEM also had live-streaming and rendering videos should eventually show up, I'll add the link as soon as it's there.

Update: FOSDEM staff published the video of the talk!

The slides and video of my talk from Steam Dev Days has been posted. It's basically the Haswell refresh (inside joke) of my SIGGRAPH talk from last year.

February 12, 2014

FOSDEM is over, and it has been a great time! I talked to many people (thanks to full devrooms ;-) Graph Databases and Config management was crazy, JavaScript as well…) and there might be some cool new things to come soon, which we discussed there :-)
I also got some good and positive feedback on the projects I work on, and met many people from Debian, Kubuntu, KDE and GNOME (haven’t seen some of them for almost 3 years) – one of the best things of being at FOSDEM is, that you not only see people “of your own kind” – for example, for me as Debian developer it was great to see Fedora people and discuss things with them, something which rarely happens at Debian conferences. Also, having GNOME and KDE closely together again (litterally, their stands were next to each other…) is something I missed since the last Desktop Summit in 2011.

My talks were also good, except for the beamer-slides-technical-error at the beginning, which took quite some time (I blame KScreen ;-) ).

In case you’re interested in the slides, here they are: slides for FOSDEM’14 AppStream/Listaller talks.

The slides can likely be understood without the talk, they are way too detailed (usually I only show images on slides, but that doesn’t help people who can’t see the talk ^^)

I hope I can make it to FOSDEM’15 as well – I’ve been there only once, but it already is my favourite FOSS-conference (and I love Belgian waffles) ;-)

February 10, 2014

So thanks to the new GStreamer 1.x VAAPI package for Fedora I was able to do a hardware encode with Transmageddon for the first time today. This has been working in theory for a while, but due to me migrating Transmageddon from GStreamer 0.10.x to GStreamer 1.x at the wrong time compared to the gstreamer-vaapi development timeline I wasn’t able to test it before.

Bellow is the GStreamer pipeline that Transmageddon use now if you have the Intel hardware encoder packages installed. Click on the image for the full image.
libva-pipeline-extract

I am also close to being able to release the new version of Transmageddon now. Most recent bugs found have turned out to be in various GStreamer elements, so I am starting to feel confident about the current pipeline building code in Transmageddon. I think that during the GStreamer Hackfest in Munich next Month I will make the new release with the nice new features such as general support for multiple audio streams, DVD ripping support and language tag setting support.

transmageddon-git-master

So we had the DevConf conference here in Brno this weekend. One of the projects I am really excited about is Cockpit. Cockpit is a new server administration tool developed by Red Hat engineers which aims at providing a modern looking and userfriendly interface for your servers. There has been many such efforts over the years, but what I feel makes this one special is that it got graphical designers and interface designers involved, to ensure that the user experience is kept in focus instead of being taken hostage by underlaying APIs or systems. Too many such interfaces, be they web based or not tend to both feel and look clunky, for instance sometimes exposing features not because anyone realistically ever would want them, but because the underlying library happen to have a call for it.

Cockpit should also hopefully put the final nail in the coffin for the so called ‘server desktop’. The idea that you need to be able to run a graphical shell using X on your server adds a lot of pain with little gain in my opinion. The Fedora Server product should hopefully become a great showpiece for how nice a Linux server can be to use and configure when you have something like Cockpit available.

There was some nice videos showing what is already in Cockpit shown at the conference so hopefully they will be available online soon. In the meantime I recommend taking a look at the Cockpit web page.

When the X server crashes it prints a backtrace to the log file. This backtrace looks something like this:


(EE) Backtrace:
(EE) 0: /usr/bin/Xorg (OsLookupColor+0x129) [0x473759]
(EE) 1: /lib64/libpthread.so.0 (__restore_rt+0x0) [0x3cd140f74f]
(EE) 2: /lib64/libc.so.6 (__select_nocancel+0xa) [0x3cd08ec78a]
(EE) 3: /usr/bin/Xorg (WaitForSomething+0x1ac) [0x46a8fc]
(EE) 4: /usr/bin/Xorg (SendErrorToClient+0x111) [0x43a091]
(EE) 5: /usr/bin/Xorg (_init+0x3b0a) [0x42c00a]
(EE) 6: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3cd0821d65]
(EE) 7: /usr/bin/Xorg (_start+0x29) [0x428c35]
(EE) 8: ? (?+0x29) [0x29]
This is a forced backtrace from the current F20 X Server package, generated by killall -11 Xorg. There is not a lot of human-readable information but you can see the call stack, and you can even recognise some internal functions. Now, in Fedora we compile with libunwind which gives us relatively good backtraces. Without libunwind, your backtrace may look like this:

(EE)
(EE) Backtrace:
(EE) 0: /opt/xorg/bin/Xorg (xorg_backtrace+0xb5) [0x484989]
(EE) 1: /opt/xorg/bin/Xorg (0x400000+0x8d1a4) [0x48d1a4]
(EE) 2: /lib64/libpthread.so.0 (0x3cd1400000+0xf750) [0x3cd140f750]
(EE) 3: /lib64/libc.so.6 (__select+0x33) [0x3cd08ec7b3]
(EE) 4: /opt/xorg/bin/Xorg (WaitForSomething+0x3dd) [0x491a45]
(EE) 5: /opt/xorg/bin/Xorg (0x400000+0x3561b) [0x43561b]
(EE) 6: /opt/xorg/bin/Xorg (0x400000+0x43761) [0x443761]
(EE) 7: /opt/xorg/bin/Xorg (0x400000+0x9baa8) [0x49baa8]
(EE) 8: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3cd0821d65]
(EE) 9: /opt/xorg/bin/Xorg (0x400000+0x25df9) [0x425df9]
So, even less information and it certainly makes it hard to figure out where to even get started. Luckily there is a tool to get some useful info out of that: eu-addr2line. All you need is to install the debuginfo package for the crashing program. Then it's just a matter of copying addresses.

$ eu-addr2line -e /opt/xorg/bin/Xorg 0x48d1a4
/home/whot/xorg/xserver/os/osinit.c:132
Alright, this is useful now, I can download the source package and check where it actually goes wrong. But wait - it gets even better. Let's say you have a driver module in the callstack:

(EE) Backtrace:
(EE) 0: /opt/xorg/bin/Xorg (xorg_backtrace+0xb5) [0x484989]
(EE) 1: /opt/xorg/bin/Xorg (0x400000+0x8d1a4) [0x48d1a4]
(EE) 2: /lib64/libpthread.so.0 (0x3cd1400000+0xf750) [0x3cd140f750]
(EE) 3: /opt/xorg/lib/libinput.so.0 (libinput_dispatch+0x19) [0x7ffff1e51593]
(EE) 4: /opt/xorg/lib/xorg/modules/input/libinput_drv.so (0x7ffff205b000+0x2a12) [0x7ffff205da12]
(EE) 5: /opt/xorg/bin/Xorg (xf86Wakeup+0x1b1) [0x4af069]
(EE) 6: /opt/xorg/bin/Xorg (WakeupHandler+0x83) [0x444483]
(EE) 7: /opt/xorg/bin/Xorg (WaitForSomething+0x3fe) [0x491a66]
(EE) 8: /opt/xorg/bin/Xorg (0x400000+0x3561b) [0x43561b]
(EE) 9: /opt/xorg/bin/Xorg (0x400000+0x43761) [0x443761]
(EE) 10: /opt/xorg/bin/Xorg (0x400000+0x9baa8) [0x49baa8]
(EE) 11: /lib64/libc.so.6 (__libc_start_main+0xf5) [0x3cd0821d65]
(EE) 12: /opt/xorg/bin/Xorg (0x400000+0x25df9) [0x425df9]
You can see that we have a xf86libinput driver (libinput_drv.so) which in turn loadsd libinput.so. You can debug the crash the same way now, just change the addr2line argument:

$ eu-addr2line -e /opt/xorg/lib/libinput.so.0 libinput_dispatch+0x19
/home/whot/code/libinput/src/libinput.c:603
Having this information of course doesn't mean you can fix any bug. But when you're reporting a bug it can be invaluable. If I have access to the same rpms that you're running it's possible to look up the context of the crash in the source. Or, even better, since you already have access to those you can make debugging a lot easier by attaching the required bits and pieces to a bug report. Seeing a bug report where a reporter already narrowed down where it crashes makes it a lot easier than guessing based on hex numbers what went wrong.

February 07, 2014

Some state-of-the-art progress of the project we’re proudly working with:

https://01.org/ozone-wayland/blogs/tiagovignatti/2014/chromium-browser-wayland

 


It’s been almost 4 years working for Lanedo, and I don’t regret a single day having worked for them. Still, I believe it was a time for a change in my life, and since February 1st I am now working as a freelance developer.

I’ve really been involved in pretty different things during my professional life… antivirus desktop applications and gateways, satellite orbit determination and time synchronization systems, VoIP-based push-to-talk server development, and of course NetworkManager, ModemManager, libqmi, libmbim, Tracker, OpenWRT or GTK+.

I would love to keep on working on all these Free Software projects, but I’m also open to exploring new technologies. Change is always good, they say.

So, if you’re looking for a passionate software engineer to develop, improve or integrate your system, just check my website and get in touch. I’d love to hear about your needs and issues and how we can solve them together :)


Filed under: FreeDesktop Planet, GNOME Planet, GNU Planet, Personal, Planets Tagged: Freelance, gtk+, libmbim, libqmi, MBIM, ModemManager, NetworkManager, openwrt, QMI, tracker
February 06, 2014
When i was still at SuSE, Localhorst would rent a Ford S-Max, and stuff it to the brim with openSuSE kit and swag and drive to FOSDEM. I usually was tolerated on board as well, with my sportsbag full of DevRoom kit, provided i sang along to the radio, and didn't mention burger-king. Everyone at SuSE either made their own arrangements, or everyone was stuck on a flight from Nuremberg to Brussels (which was quickly dubbed "The SuSE-Bomber").

After the massive and rather counterproductive layoffs after FOSDEM 2009, SuSE tended to organize a bus for its own employees. And from what i heard, it was a pretty good solution. Imagine a load of happy geeks, from a place in the world with the best beers, stuck on a bus. It made the whole event seem like a school trip, but one where some beers were actually allowed. And, since there usually was tons of extra space on the bus, a load of ex-SuSE guys got to hitch a ride as well. The result was that a lot of SuSE employees visited FOSDEM, and got to catch up on things with some ex-SuSE guys and generally start doing what conferences are for from the second the bus left Nuremberg. Since busses are cheap, this really was a perfect solution, and everyone was happy.

I never took the bus. For FOSDEM, I want to arrive in brussels around suppertime on friday, and leave monday around midday (when the alcohol of the previous night has worn off a bit). The bus tended to arrive around suppertime as well, but would leave again around 19:00 on sunday. Also, i tend to run a devroom and have at least one talk. I need those 5 hours of peace on the train to prepare my talk. But I heard good things about the bus, and that it all was great fun and a bit of a community event before (and after) the big community event.

This year, however, things were different. For a long time, apparently, nobody from the OpenSuSE team was really bothered with FOSDEM. This i find truly amazing, and a really bad sign with respect of where SuSE and openSuSE is apparently heading. From what i have heard, there was always a bit of a plan to get a bus, but it was unclear where the budget would come from, and no-one took any action. I also heard that two weeks before the event, SuSE employees were asked whether they wanted to go to FOSDEM. Now if you do this 2 weeks before the event, with people who often have a wife and kids these days, most will already have made other plans. Then, if you also state that those people who might be interested, also need to have some travel budget left over, and need to get approval in a few days time, you of course only get a handful of people who end up going to the biggest open source event of the planet. I heard the number 8.

8 people from SuSE went to FOSDEM. An absolute disgrace for what once was the leading european linux distribution.

Here is an idea: why not make the bus a community service from the start? Why doesn't OpenSuSE sponsor a bus, one which starts at nuremberg, and perhaps stops in Frankfurt-Flughafen (so some people can grab a smoke and empty their beer-filled bladders), and then continues on to Brussels? Give the SuSE employees 4 weeks early notice to get their seats reserved (which gives them an incentive to think about FOSDEM early on), and then make seat reservation open to anyone who wants to go visit FOSDEM and lives near Nuremberg or Frankfurt? You can even hand the community members a bag of SuSE swag and a frankonian beer.

SuSE would not only do something good for their own employees and makes it easier on them to visit FOSDEM. They would actually sponsor FOSDEM and help boost their openSuSE community.

I am actually surprised that this has to be said, and that idea hasn't been spawned from within the OpenSuSE team itself. But here you go. Now make it happen for next year.
February 05, 2014
Complementing the hw binning support which landed earlier this year, and is now enabled by default, I've recently pushed the initial round of new-compiler work to mesa.  Initially I was going to keep it on a branch until I had a chance to sort out a better register allocation (RA) algorithm, but the improved instruction scheduling fixed so many bugs that I decided it should be merged in it's current form.

Or explained another way, ever since fedora updated to supertuxkart 0.8.1, about half the tracks had rendering problems and/or triggered gpu hangs.  The new compiler fixed all those problems (and more).  And I like supertuxkart :-)

Background:

The original a3xx compiler was more of a simple TGSI translator.  It translated each TGSI opcode into a simple sequence of one or more native instructions.  There was a fixed (per-shader) mapping between TGSI INPUT, OUTPUT, and TEMP vec4 register files to the native (flat) scalar register file.  A not-insignificant part of the code was relatively generic, in concept but not implementation, lowering of TGSI opcodes that relate more closely to old ARB shader instructions, (SCS - Sine Cosine, LIT - Light Coefficients, etc) than the instruction set of any modern GPU.

The simple TGSI translator approach works fine with simple shader ISA's.  It worked ok for a2xx, other than slightly suboptimal register usage.  But the problem is that a3xx (and a4xx) is not such a simple instruction set architecture.  In particular, the instruction scheduling required that the compiler be aware of the shader instruction pipeline(s).  

This was obvious pretty early on in the reverse engineering stage.  But in the early days of the gallium a3xx support, there were too many other things to do... spending the needed time on the compiler then was not really an option.  Instead the "use lots of nop's and hope for the best" strategy was employed.

And while it worked as a stop-gap solution, it turns out that there are a lot of edge cases where "hope for the best" does not really work out that well in practice.  After debugging a number of rendering bugs and piglit failures which all traced back to instruction scheduling problems, it was becoming clear that it was time for a more permanent solution.

In with the new:

First thing I wanted to do before adding a lot more complexity is to rip out a bunch of code.  With that in mind I implemented a generic TGSI lowering pass, to replace about a dozen opcodes with sequences of equivalent simpler instructions.  This probably should be made configurable and moved to util, I think most of the lowerings would be useful to other gallium drivers.

Once the handling of the now unneeded TGSI opcodes was removed, I copied fd3_compiler to fd3_compiler_old.  Originally the plan was to remove this before pushing upstream.  I just wanted a way to compare the results from the original compiler to the new compiler to help during testing and debugging.  But currently shaders with relative addressing need to fall back to the old compiler, so it stays for now.

The next step was to turn ir3 (the a3xx IR), which originates from the fdre-a3xx shader assembler into something more useful.  The approach I settled on (mostly to ease the transition) was to add a few extra "meta-instructions" to hold some additional information which would be needed in later passes, including Φ (Phi) instructions where a result depends on flow control.  Plus a few extra instruction and register flags, the important one being IR3_REG_SSA, used for src register nodes to indicate that the register node points to the dependent instruction.  Now what used to be the compiler (well, roughly 2/3rds of it) is the front-end.  Instead of producing a linear sequence of instructions fed directly to the assembler/codegen, the frontend is now generating a graph of instructions modified by subsequent passes until we have something suitable for codegen.

For each output, we keep the pointer to the instruction which generates that value (at the scalar level), which in turn has the pointer to the instructions generating it's srcs/inputs, and so on.  As before, the front end is generating sequences of scalar instructions for each (written) component in a TGSI vector instruction.  Although now instructions whose result is not used simply has nobody pointing to them so they naturally vanish.

At the same time, mostly to preserve my sanity while debugging, but partially also to make nifty pictures, I implemented an "ir3 dumper" which would dump out the graph in .dot syntax:


The first pass eliminates some redundant moves (some of which come from the front end, some from TGSI itself).  Probably the front end could be a bit more clever about not inserting unneeded moves, but since TGSI has separate INPUT/OUTPUT/TEMP register files, there will always be some extra moves which need eliminating.

After that, I calculate a "depth" for each instruction, where the depth is the number of instruction cycles/slots required to compute that value:

    dd(instr, n): depth(instr->src[n]) + delay(instr->src[n], instr)
    depth(instr): 1 + max(dd(instr, 0), ..., dd(instr, N))

where delay(p,c) gives the required number of instruction slots between an instruction which produces a value and an instruction which consumes a value.

The depth is used for scheduling.  The short version of how it works is to recursively schedule output instructions with the greatest depth until no more instructions can be scheduled (more delay slots needed).  For instructions with multiple inputs/srcs, the unscheduled src instruction with the greatest depth is scheduled first.  Once we hit a point where there are some delay slots to fill, we switch to the next deepest output, and so on until the needed delay slots are filled.  If there are no instructions that can be scheduled, then we insert nop's.

Once the graph is scheduled, we have a linear sequence of instructions, at which point we do RA.  I won't say too much about that now, since it is already a long post and I'll probably change the algorithm.  It is worth noting that some register assignment algorithms can coalesce unneeded moves.  Although moves factor into the scheduling decisions for the a3xx ISA, so I'm not really sure that this is too useful me.

The end result, thanks to a combination of removal of scalar instructions to calculate TGSI vec4 register components which are unused, plus removal of unnecessary moves, plus scheduling other instructions rather than filling with no-op's everywhere, for non trivial shaders it is not uncommon to see the compiler use ~33% the number of instructions, and half the number of registers.

Testing/Debugging:

Validating compilers is hard.  Piglit has a number of tests to exercise relatively specific features.  But with games, it isn't always the case that an incorrect shader produces (visually) incorrect results.  And visually incorrect results are not always straightforward to trace back to the problem.  Ie. games typically have many shaders, many draw calls, tracking down the problematic draw and it's shaders is not always easy.

So I wrote a very simplistic emulator for testing the output of the compiler.  I captured the TGSI dumps of all the shaders from various apps (ST_DEBUG=tgsi).  The test app would assemble the TGSI, feed into both the old and new compiler, then run same sets of randomized inputs through the resulting shaders and compare outputs.

There are a few cases where differing output is expected, since the new compiler has slightly more well defined undefined behaviour for shaders that use uninitialized values... to avoid invalid pointers in the graph produced by the front-end, uninitialized values get a 'mov Rdst, immed{0.0}' instruction.  So there are some cases where the resulting shader needs to be manually validated.  But in general this let me test (and debug) the new compiler with 100's of shaders in a relatively short amount of time.

Performance:

So the obvious question, what does this all mean in terms of performance?  Well, start with the easy results, es2gears[1]:
  • original compiler: ~435fps
  • new compiler: ~539fps
With supertuxkart, the result is a bit easier to show in pictures.  Part of the problem is that the tracks that are heavy enough on the GPU to not be purely CPU limited, didn't actually work before with the original compiler.  That plus, as far as I know, there is no simple benchmark mode which spits out a number at the end, as with xonotic.  So I used the trace points + timechart approach, mentioned in a previous post.

    supertuxkart -f --track fortmagma --profile-laps=1

I manually took one second long captures, in as close to the same spot as possible (just after light turns green):

    ./perf timechart record -a -g -o stk-apq8074-opt+bin-1.data sleep 1

In this case I was running on an apq8074/a330 device, fwiw.  Our starting point is:


Then once hw binning is in place, we are starting to look more CPU limited than anything:


And with addition of new compiler, the GPU is idle more of the time, but since the GPU is no longer the bottleneck (on the less demanding tracks) there isn't too much change in framerate:


Still, it could help power if the GPU can shut off sooner, and other levels which push the GPU harder benefit.

With binning plus improved compiler, there should not be any more huge performance gaps compared to the blob compiler.  Without linux blob drivers, there is no way to make a real apples to apples comparison, but remaining things that could be improved should be a few percent here and there.  Which is a good thing.  There are still plenty of missing features and undiscovered bugs, I'm sure.  But I'm hopefully that we can at least have things in good shape for a3xx before the first a4xx devices ship ;-)


-----
[1] Windowed apps would benefit somewhat from XA support in DDX, avoiding stall for GPU to complete before sw blit (memcpy) to front buffer.. but the small default window size for 'gears means that hw binning does not have much impact.  The remaining figures are for fullscreen 1280x720.

Now that my body is well on its road to recovery, it's time to talk about the great success that FOSDEM was, once again.

We had a really nice devroom and pretty good crowds, but the most amazing thing was the recordings and the livestream. The FOSDEM organizers really outdid themselves there.

After the initial announcement of the graphics devroom went out, Martin Peres contacted me and we talked about getting proper recordings from the DevRoom, and briefly looked into what this would end up costing if we were to buy the equipment for it. Our words were barely cold when the FOSDEM organizers announced what can only be seen as absolutely insane: they would try to provide for full recording of everything.

In the end, FOSDEM provided recording in 22 rooms. They had to get 22 mixing desks, 22 sets of microphones, 22 cameras, 44 laptops... This was apparently mostly rented, and apparently, surprise, surprise, there aren't many rental companies which have such an insane amount of kit available.

Apart from some minor issues (like a broken firewire cable in the wine devroom) things worked out amazingly. Only the FOSDEM guys could pull something this insane off. We had all our talks in the Graphics DevRoom streamed out live, with no issues at all.

I would like to thank all the speakers in the graphics devroom, but i particularly would like to thank Martin Peres, who took full control and responsibility of the video equipment. Then there's Marcus Meissner and Thiery Redding who willingly sat in the back of the devroom and handled the recordings themselves, directing the streams, for only meagre rations of water and cookies. Without people stepping up and doing their bit, a devroom like this would not be possible. And the same goes for the whole of FOSDEM.

At the end of the final talk, after i had talked about sunxi_kms, i tried to thank the FOSDEM organizers, and get the remaining audience to clap for them. But i mostly stood there and babbled, at a loss for words, because what the FOSDEM organizers had achieved with this insane goal is simply amazing. And thinking about it now still, i still get quite emotional...

How on earth did they manage to pull this off, on top of organizing the rest of FOSDEM, a FOSDEM which caters for something like 8000 people as it is... It's just not possible!

It's not as if these guys get paid for what they are doing, FOSDEM is a low budget organization, purely based on volunteers. The absolute core of the organization is just a handful of people who have very busy jobs. And yet, they have succeeded where any other organization would have failed. There's no politics or powerplay, there is no money or commerce. There is just the absolute drive to make FOSDEM the best event on the planet, by making small changes every year...

This was my 7th DevRoom this year, and if i can help it there will be another one next year. I am really proud that i am allowed to do my, comparatively little, part as well. Every sunday evening after FOSDEM, after we sit down in the restaurant with the remainders of the graphics devroom, i am physically broken, but i am also one of the happiest people on the planet...

Each year, no matter what happened in the year before, no matter what nasty open source politics or corporate nonsense took place over that year... Each year, the FOSDEM organizers prove that something amazing can happen if only people do their bit, if only people work towards the same selfless goal. Each year, FOSDEM reminds me of why i do what i do, and why i need to keep on doing it.

Thank you.
It's been some time in the making, with the redesign work started a couple of release cycles ago, but we finally reached a state where it's usable, and leaps and bounds easier to use than the previous versions.

I should note that I use Totem and Videos interchangeably, Totem is still the name of the project, code repository, but the user-visible name is Videos (or GNOME Videos if differentiation is necessary).

Discovery

The old UI made it particularly hard to consume media from various web sites, as you can see from the screenshot below. It's cramped, we had separate sidebars for search and browse, we didn't show icons for browse, etc.


And here's the new UI, browsing the same source (Apple Trailers).


This is definitely easier to find media. Totem also had a number of specific plugins to find media sources, usually from third-party developers. We don't support those anymore, and if you have been writing such a plugin, you should port them to grilo, now a hard-dependency.

I've also spent some time working on Grilo and its plugins, creating a few new sources along the process.


Amongst the new ones are the Freebox TV plugin, the under-powered Guardian Videos source, and the not-yet-fully-integrated Pocket videos list. Don't forget to check for blocker bugs if you're trying to test those!

Playback

This is also a pretty big upgrade. We now have video-specific menu item, the gear menu, and better looking pop-ups. This matches the design used in GNOME Documents for sliders. It's also better suited for touch: a mouse move will show the OSD for a short time, but a touchscreen tap will show the OSD until you tap it again.


The older version had some features only available in windowed, such as rotation, or zoom, and some we tried to cram into the context menu (subtitle or sound track selection for example). Here, there's no loss of accessibility for features, they're all in the same gear menu, whether fullscreen or windowed.

Bugs, bugs, bugs

With a few valiant testers and designers, we tried to fix a number of bugs. This release doesn't mean Videos is bug-free, far from it, but it's certainly robust and usable enough to make this development release.

There's some theming bugs, as can be seen in that last screenshot's previous/next buttons, there's bugs in grilo and grilo-plugins, and there's bugs in Videos itself.

Do file bugs when you see something amiss, it'll help designers and myself move items from our own TODO lists :)

What's next

A lot :)
You can see some of those in the design wireframes.

"Make available offline" is something I have a great interest in, especially coupled with the Pocket source. Selecting a bunch of items to watch later, on the train, or on the plane.

Better metadata, especially for films and series. This isn't just for films you snarfed from Usenet and torrent sites either. The already existing Rai.tv source has a number of films, and a BBC iPlayer source is planned.

Finally, remote playback, to "throw" videos from your laptop to the TV. Controls should still work, and we'll want to allow browsing through sources when playback is remote.

Notes on development

Half notes, half thanks. As mentioned in the introduction, this release has been some time in the making, but it also comes at a time when we've had the necessary plumbing to make all this possible.

To name but a few, we've made good use of gnome-documents' widgets to list videos, the GtkStack, GtkRevealer and GtkPopover. The GtkSearchBar and GtkSearchEntry widgets are also examples of widgets that moved to GTK+ for Videos' development.

Getting it

Soon in your development distributions, in totem's master git branch, and in GNOME's FTP server.
February 04, 2014
So earlier this year I've signed up Jani Nikula officially as co-maintainer. Now we've also gotten around to move the drm/i915 git repository to a common location so that there's no longer a need to move it around when I'm on vacation. So everone please update any git references and remotes:

git://anongit.freedesktop.org/drm-intel
ssh://git.freedesktop.org/git/drm-intel
February 01, 2014

The slides from my FOSDEM talk are now available.

Missing FOSDEM

I’m afraid Eric and I won’t be at FOSDEM this weekend; our flight got canceled, and the backup they offered would have gotten us there late Saturday night. It seemed crazy to fly to Brussels for one day of FOSDEM, so we decided to just stay home and get some work done.

Sorry to be missing all of the fabulous FOSDEM adventures and getting to see all of the fun people who attend one of the best conferences around. Hope everyone has a great time, and finds only the best chocolates.

January 31, 2014
In the last two (or three?) weeks at Collabora I have been looking into a Wayland protocol extension that would allow accurately timed presentation. Accurate timing is essential for two quite different use cases: video playback with audio/video synchronization, and interactive GUI with just-in-time redrawing. Video playback with A/V sync was our primary goal when we started working on this, and there is Frederic Plourde's first proposal from October 2013. Since then I have realized that also other kinds of applications need timings, especially feedback on when their content updates were shown, and when is the next chance to show an update (vblank). Primarily my re-design started with the aim to improve resizing performance when I got the assignment from Daniel Stone to push Wayland presentation protocol forward. The RFC v2 of Wayland presentation extension is now out for review and discussion.

I looked at various timing and content posting related APIs, like EGL and its extensions, GLX_OML_sync_control, and the new X11 Present and Keith's blog posts about it. I found a couple of things I could not understand what they were for and asked about them on a few mailing lists. The replies and further pondering resulted in a conclusion that I do not have to support the MSC modulus matching logic, and eglSwapInterval for intervals greater than one could be implemented if anyone really needs it.

I took a more detailed look at X11 Present for XWayland purposes. The Wayland presentation protocol extension I am proposing is not meant to solve all problems in supporting X11 Present in XWayland, but the investigation gave me some faith that with small additional XWayland extensions it could be done. Axel Davy is already implementing Present on core Wayland protocol as far as it is possible anyway, and we had lots of interesting discussions.

I am not going into any details of the RFC v2 proposal here, as the email should contain exhaustive documentation on the design. If no significant flaws are found, the next steps would be to implement this in Weston and see how it works.

FOSDEM2014-wordtag-small

Third time in a row, I’ll also be giving a talk at FOSDEM this year:
LTE in your Linux based system

The talk is an introduction to QMI and MBIM devices, how they are exposed by the Linux kernel, and how you can use them from the command line through libqmi, libmbim and ModemManager.

See you there!


Filed under: FreeDesktop Planet, GNOME Planet, GNU Planet, Meetings, Planets Tagged: FOSDEM, libmbim, libqmi, linux, MBIM, ModemManager, QMI
January 29, 2014

Multidimensional arrays are added to GLSL via either GL_ARB_arrays_of_arrays extension or GLSL 4.30. I've had a couple people tell me that the multidimensional array syntax is either wrong or just plain crazy. When viewed from the proper angle, it should actually be perfectly logical to any C / C++ programmer. I'd like to clear up a bit of the confusion.

Staring with the easy syntax, the following does what you expect:

    vec4 a[2][3][4];

If a is inside a uniform block, the memory layout would be the same as in C. cdecl would call this, "declare a as array 2 of array 3 of array 4 of vec4".

Using GLSL constructor syntax, the array can also be initialized:

    vec4 a[2][3][4] = vec4[][][](vec4[][](vec4[](vec4( 1), vec4( 2), vec4( 3), vec4( 4)),
                                          vec4[](vec4( 5), vec4( 6), vec4( 7), vec4( 8)),
                                          vec4[](vec4( 9), vec4(10), vec4(11), vec4(12))),
                                 vec4[][](vec4[](vec4(13), vec4(14), vec4(15), vec4(16)),
                                          vec4[](vec4(17), vec4(18), vec4(19), vec4(20)),
                                          vec4[](vec4(21), vec4(22), vec4(23), vec4(24))));

If that makes your eyes bleed, GL_ARB_shading_language_420pack and GLSL 4.20 add the ability to use C-style array and structure initializers. In that model, a can be initialized to the same values by:

    vec4 a[2][3][4] = {
        {
            { vec4( 1), vec4( 2), vec4( 3), vec4( 4) },
            { vec4( 5), vec4( 6), vec4( 7), vec4( 8) },
            { vec4( 9), vec4(10), vec4(11), vec4(12) }
        },
        {
            { vec4(13), vec4(14), vec4(15), vec4(16) },
            { vec4(17), vec4(18), vec4(19), vec4(20) },
            { vec4(21), vec4(22), vec4(23), vec4(24) }
        }
    };

Functions can be declared that take multidimensional arrays as parameters. In the prototype, the name of the parameter can be present, or it can be omitted.

    void function_a(float a[4][5][6]);
    void function_b(float  [4][5][6]);

Other than the GLSL constructor syntax, there hasn't been any madness yet. However, recall that array sizes can be associated with the variable name or with the type. The prototype for function_a associates the size with the variable name, and the prototype for function_b associates the size with the type. Like GLSL constructor syntax, this has existed since GLSL 1.20.

Associating the array size with just the type, we can declare a (from above) as:

    vec4[2][3][4] a;

With multidimensional arrays, the sizes can be split among the two, and this is where it gets weird. We can also declare a as:

    vec4[3][4] a[2];

This declaration has the same layout as the previous two forms. This is usually where people say, "It's bigger on the inside!" Recall the cdecl description, "declare a as array 2 of array 3 of array 4 of vec4". If we add some parenthesis, "declare a as array 2 of (array 3 of array 4 of vec4)", and things seem a bit more clear.

GLSL ended up with this syntax for two reasons, and seeing those reasons should illuminate things. Without GL_ARB_arrays_of_arrays or GLSL 4.30, there are no multidimensional arrays, but the same affect can be achieved, very inconveniently, using structures containing arrays. In GLSL 4.20 and earlier, we could also declare a as:

    struct S1 {
        float a[4];
    };

    struct S2 {
        S1 a[3];
    };

    S2 a[2];

I'll spare you having to see GLSL constructor initializer for that mess. Note that we still end up with a[2] at the end.

Using typedef in C, we could also achieve the same result using:

    typedef float T[3][4];

    T a[2];

Again, we end up with a[2]. If cdecl could handle this (it doesn't grok typedef), it would say "declare a as array 2 of T", and "typedef T as array 3 of array 4 of float". We could substitue the description of T and, with parenthesis, get "declare a as array 2 of (array 3 of array 4 of float)".

Where this starts to present pain is that function_c has the same parameter type as function_a and function_b, but function_d does not.

    void function_c(float[5][6] a[4]);
    void function_d(float[5][6]  [4]);

However, the layout of parameter for function_e is the same as function_a and function_b, even though the actual type is different.

    struct S3 {
        float a[6];
    };

    struct S4 {
        S3 a[5];
    };

    void function_e(S4 [4]);

I think if we had it to do all over again, we may have disallowed the split syntax. That would remove the more annoying pitfalls and the confusion, but it would also remove some functionality. Most of the problems associated with the split are caught at compile-time, but some are not. The two obvious problems that remain are transposing array indices and incorrectly calculated uniform block layouts.

    layout(std140) uniform U {
        float [1][2][3] x;
        float y[1][2][3];
        float [1][2] z[3];
    };

In this example x and y have the same memory organization, but z does not. I wouldn't want to try to debug that problem.

January 28, 2014

The problem

While working on a side project I was confronted with the problem of serving a JSON object to potentially tens of thousands of users representing the state of the app (think you are following the results of an election live and you want data to be updated every second).

The web service is really simple, it grabs the payload from the data store (disk/Redis/Mongo), uploads it every second, and serves the content accordingly.

Initially I implemented this using Python and Flask, I must admit that I didn't try too hard to get the data in an async fashion since it wasn't obvious how to do this with Flask. Using twisted could have been a better choice.

Initially I attempted to use Varnish as a front-line HTTP cache with promising results, however using HTTPS became a requirement pretty soon (due to CORS policies) so I had to discard Varnish rather quickly.

Go's bare HTTP server performance

 

Go_gopher_color_logo_250x249[1]However I took the opportuninty to teach myself Go which I knew it had pretty good support for concurrent operations and that the Google guys already have a success story serving static data with it. Boy have I been surprised.

Take the simplest form of an HTTP server that serves a common JSON payload for all the requests:

Gist link

Note that it reads the data just once from a json file in the disk. No updates every second, for the sake of showing how well Go can do as a static HTTP server, these are the performance results using httpress  on my ThinkPad T430s (i7-3520M 2.90GHz/8GB of RAM):

$ ./httpress -n 100000 -c 1000 -t 6 -k http://localhost:8080
TOTALS:  1000 connect, 100000 requests, 100000 success, 0 fail, 1000 (1000) real concurrency
TRAFFIC: 4195 avg bytes, 109 avg overhead, 419532658 bytes, 10900000 overhead
TIMING:  3.752 seconds, 26646 rps, 112007 kbps, 37.5 ms avg req time

That's 26646 request per second with 1000 concurrent keep-alive connections in 6 parallel threads without doing anything special, I must say that I'm impressed that the Go developers have put so much effort in their built-in HTTP server implementation.

Go's HTTPS performance

 

Now that Go started to look as a viable option, let's see how it behaves as a HTTPS server. Here's the link to the modifications to turn the previous example into an HTTPS service, note that you have to generate the certificates.

TOTALS:  1000 connect, 100000 requests, 99862 success, 138 fail, 862 (862) real concurrency
TRAFFIC: 3994 avg bytes, 109 avg overhead, 398848828 bytes, 10884958 overhead
TIMING:  22.621 seconds, 4414 rps, 17688 kbps, 226.5 ms avg req time

Damn you secure webs! That's a big downgrade in performance... 4k rps now. This made me realize that this new security scheme for cross site content implemented in browsers is going to kill a lot of trees. At this point I wanted to investigate further for possibilities boost this up.

Revisiting the front-line cache idea: Cherokee

My suspicion is that the TLS handling inside Go's net/http has some chances to be improved, but I didn't want to get my hands dirty there, so I started to think about off-the-shelf alternatives.

I stumbled upon Cherokee's front-line cache by reverse HTTP feature which is really easy to setup thanks to the cherokee-admin interface. This means that I run my original plain HTTP Go implementation as-is behind Cherokee so that it handles the TLS by itself.
These are the results:

TOTALS:  1523 connect, 100000 requests, 99511 success, 489 fail, 1000 (981) real concurrency
TRAFFIC: 8635 avg bytes, 142 avg overhead, 859277485 bytes, 14131208 overhead
TIMING:  12.975 seconds, 7669 rps, 65735 kbps, 130.4 ms avg req time

There's quite an improvement in latency, though the requests per second improvement while worth the while is not as big as I would have expected.

Conclusions

And these are my findings so far, I want to try to configure Cherokee so that it only talks to the backend server once every second and cache the requests that way, haven't figured that out yet.

As per my experience with Go, it has been mostly pleasant, it's nice to have a mix of low level and high level features in the language, though it has some things that are quite confusing for a C/Python/Vala guy like myself. Eventually I will post some notes about the synchronization primitives I've used in the implementation of the service that gets the data from Redis every second.

Hey,

During my Google Summer of Code 2013, a part of my project was to reverse engineered GPU graphics counters on NVIDIA Tesla. However, these counters are only exposed on Windows through the NVIDIA NVPerfKit performance tools.

Usually the Nouveau community uses envytools, a collection of tools to help developers understand how NVIDIA GPUs work. Envytools depends on libpciaccess which is only available on POSIX platforms. That’s why I decided to port libpciaccess to Windows/Cygwin to be able to use these tools.

This port depends on WinIo which allows direct I/O port and physical memory access under Windows NT/2000/XP/2003/Vista/7 and 2008.

This port has been accepted in libpciaccess/master and merged today. It has only been tested on Windows Seven 32 bits, and has to be checked and fixed on 64 bits.

To use it, please follow the instructions found in README.cygwin.

This support helped me to understand how GPU graphics counters work on NVIDIA Tesla. I started writing a documentation of these counters here.

See you later!


I noticed an article on Phoronix today about libinput which made me think I should post a little Wayland update again. So Libinput is developed by Peter Hutterer who is part of the Graphics team here at Red Hat, and our resident input expert. He is developing libinput as part of our work to get Fedora Wayland ready.

That said input is a complex area and if we do end up not having a Wayland option with feature parity with the X.org option in Fedora 21, then not having gotten input sorted in time is the most likely scenario. That said we are still charging ahead with the goal of getting things ready for Fedora 21, but in our last status meeting we did flag input as our biggest risk. Luckily Peter is not alone in his efforts on libinput, there are a healthy amount of community contributions and at Red Hat we have recently had Hans de Goede join Peter on input. So we are doing our utmost to make sure the pieces fall into place.

Our minimum todo list for input that will enable Wayland to be on parity with X for at least 90% of our users is as follows:

  • keyboard works as-is
  • mouse supports left/right-handed button mapping
  • mouse/touchpad middle mouse button emulation
  • touchpad scrolling and tapping
  • touchpad software-emulated buttons work on clickpads
  • touchpad disable-while-typing

But there are of course other items on here too, like Wacom tablet support, which is of interest to a much smaller segment of our users, but still important to get done. We might have to push some of these more niche items onto the F22 timescale.

Also if anyone is wondering about game pads and such, we don’t have any concrete plans around them currently in the context of Wayland, as when we spoke to Valve and the SDL team they currently access game controllers directly through the kernel interfaces, and preferred to keep doing that. So we decided not to try to second guess them on this considering they been doing game development for years and we haven’t :)