Wednesday, December 04, 2013

OMTC for Windows nightly users

I just landed a patch flipping the switch for all Windows users with HWA (d3d9/10/11) to use off-main-thread compositing. This is a fairly big change to our rendering pipeline, so if you notice rendering issues on Windows, please file bugs.

For now, only nightly users will get this change. Riding the trains depends on bugs 913503 and 904890, and general stability. We wanted to land this early to get some extra testing time and because without being tested, it has been rotting super quickly. I will arrange for a branch to keep testing main thread composition asap.

One known issue is windowed plugins lagging during scrolling (913503), so please ignore that (for now) if you observe it.

OMTC can be disabled by setting the 'layers.offmainthreadcomposition.enabled' pref to false. If there are more problems than we can fix relatively quickly, we can do this for all users very easily.

This is the culmination of months of work, so it is really exciting to see it finally land. I fully expect, however, to have to turn it off due to a tidal wave of unforeseen bugs, but such is life.

Tuesday, November 26, 2013

No more main-thread OpenGL in Firefox (important note for Linux users who use OpenGL)

Main-thread compositing with OpenGL is no more (bug 924403). The only option for OpenGL compositing is off-main-thread compositing (OMTC). This is the first big chunk of code removal following on from the layers refactoring, which began more than a year ago. It is very nice to finally be removing code and reducing the number of code paths in the graphics module.

Most users should not notice a difference. All supported configurations which use OpenGL (FirefoxOS, Android, modern OSX) already use OMTC. If you use OpenGL on Linux, however, read on.

OpenGL in Linux

OpenGL on Linux is not a supported configuration (i.e., fixing bugs is not a priority - we would love some volunteer help here, by the way, get in contact if you're keen). However, if you have good luck with your drivers, then it works pretty well and can be enabled by setting the 'layers.acceleration.force-enabled' pref to true. The main benefit is improved WebGL performance. OMTC on Linux also works pretty well, but is also not a supported configuration. If you want to continue using OpenGL on Linux for versions of Firefox 28 and later you will need to use OMTC. (Note that if you are considering trying OpenGL on Linux for the first time, you should use a new profile to make it easier to undo if your drivers are not cooperative. And be prepared for your system to crash, potentially).

Nightly users will automatically get OMTC, if they currently get OpenGL. That is, for Nightly users, setting 'layers.acceleration.force-enabled' on Linux will get you OpenGL with OMTC.

For Aurora, Beta, and Release users (once version 28 hits those channels), you will also need to set the environment variable 'MOZ_USE_OMTC' and the 'layers.offmainthreadcomposition.enabled' pref (as well as the 'layers.acceleration.force-enabled' pref) if you want OMTC. Otherwise, you will get basic layers (software composition, although usually hardware accelerated by X).

Sunday, November 10, 2013

Iterative Development

Modern software engineering 'theory' is all about iterative. Each new development seems to be more iterative than the last (contrast agile with the waterfall process, for example). It is widely acknowledged that iterative development is better than 'big bang' development.

Open source development (by which I mean the open source development 'process', as opposed to just developing software under an open source licence) is intrinsically iterative - we have a huge lump of code which we add to (or subtract from) one small (hopefully) patch at a time.

I believe that making our development of each piece (bug, issue, patch, project, whatever) iterative rather than 'big bang' is important, but difficult. One attribute of the engineers I admire at Mozilla is that they achieve this. I try, but often fail, to do so. I think it is important to be iterative because it makes it easier to catch bad design decisions earlier, makes estimation and progress assessment easier (and thus it is easier to keep interested parties happy), and, I think (and somewhat counter-intuitively), it actually results in better design in the long run. That last one is (in my fuzzy beginnings of a theory) because the developer needs a better understanding of the problem upfront to be able to identify the iterative steps required. And, as you take these steps, that understanding (and thus the solution, including earlier stages) improves.

I don't really know how to be more iterative (I'd love to hear ideas in the comments). But, I do know you have to be pretty aggressive (intellectually) in sticking to the iterative path - saving things up and landing in a big bang is often the path of least resistance.

Friday, November 01, 2013

Paris work week

I just returned from another work week, this time in Paris. Coming after the summit and some personal travel it has been a particularly exhausting month. But, as always, the work week was great. It is amazing to be able to work with such an awesome group of friendly, dedicated, cooperative, and inspirationally smart people (except gw280, he's awful). I am very lucky to work at Mozilla and with my team - being paid to work on such interesting and high impact problems with such good people is a real privilege.

Wednesday, October 30, 2013

Syntax IS important

In the domains of programming language theory and design I often see it stated that 'syntax is not important'. Syntax is believed to be superficial, well understood, and easily changed. As opposed to the semantics of a language which are interesting and fundamental. That is fine in the world of programming language theory (I have made similar statements myself). However, I occasionally see the same statement being made about programming language design; usually by PLT folks. In that circumstance, I believe it is wrong.

Syntax may be theoretically dull, but it is what is staring the programmer in the face every second they are using a language. The subtlest tweaks can have profound changes on how easy a language is to read and write. Language users get all excited and argumentative about tiny elements of syntax because it is important. Not because they are cretins. Syntax is the PL equivalent of a library's API. It doesn't matter how great a library is, if its API sucks, the library sucks. Likewise with PLs, it might have the greatest semantics in the world, but if the syntax sucks, then it will not be nice to use and will not get uptake (unless there is some seriously motivational use case).

Of course what makes a good syntax is subjective, and highly dependent on what syntaxes an individual is familiar with. But that makes designing a good syntax more difficult, not less important.

Thursday, October 10, 2013

Summit 2013

Last weekend was the Mozilla summit. I attended the Toronto event. It was awesome. Also: exhausting, fun, informative, interesting, inspiring, sometimes awkward, sometimes over the top. A big thank you is due to the organisers. They did a fantastic job on an event of this size, everything was well planned and went smoothly and there was a nice balance of talks, free time, etc. That is really hard to get right, so kudos.

It was great to meet with lots of Mozillians outside the rendering team. That was probably the best thing about the time. Many of the technical sessions were really interesting too.

Now I just have to get over the jet lag and exhaustion from all the socialising (being sociable is hard) and get stuck into coding again.

Monday, September 09, 2013

A real-life Heisenbug

I spent three or four days debugging (a while ago now, this blog post has been aging for longer than expected in my blog cellar) an interesting and hard to find bug on Windows. It became apparent due to intermittent failures on Try with Windows OMTC builds. Finding the route cause has been an adventure and turned out to have nothing much to do with compositing or graphics. All the action is in bug 896896, if you are interested to see some code.

A Heisenbug is a bug whose behaviour is changed by observation. In this case it would go away if I set more than a few breakpoints (breakpoints which print something and continue, not actually breaking, which also caused the bug to disappear). That made debugging annoying, I was reduced to using printfs for logging, and even that sometimes affected the frequency of reproduction.

The failure occurred for mochitests with a drop down select box. About one in five test runs was timing out and thus failing. They timed out with a crash, but there was no useful crash information. There were a few mochitests which failed and the thing they had in common was a drop down list box. The first step was to try and reproduce this locally. For the longest time I could not. I tried using test pages with list boxes, all worked fine. I tried to run the mochi tests in question and it always succeeded. The simplest of the tests simulated clicking on the select box to open and close it. By accident I noticed that sometimes, the mochitest would freeze, with the list box dropped down, until I moved the mouse. A bit more experimentation showed that if the cursor was outside the window then I could move it and it would remain frozen. A bit more experimentation and I found this was specific to OMTC (it never happened with main-thread composition) and didn't depend on the actual compositor (d3d9 and d3d11 had the same problem). So, it seemed to be a problem with simulated clicks on select boxes. I tried attaching a debugger and breaking when the mochitest froze. This didn't reveal anything interesting - the compositor thread was waiting for messages and the main thread was somewhere doing Windows message stuff or sometimes GC (which is where it usually is if you randomly pause Firefox).

I then spent a day or so figuring out exactly how those simulated clicks worked. The mochitest did four clicks, and usually we got stuck between the 3rd and 4th. The test set a 500ms timeout in JS and we never got to the forth click which should happen when the timeout timed out. It took me a while to figure this out, of course. At first it looked like a problem in the code which rolled up the select box, in part because we hit a different code path for that depending on whether there is a real click or a simulated click.

So the question now looked like why wasn't our timeout completing? More logging confirmed that it was being set correctly and (eventually, in the freezing case) that it correctly called back. Investigating with Spy++ showed that when we froze, we processed a WM_PAINT message and then just stopped. So why was our message queue empty? Well if the mouse isn't in the window and nothing much else is happening, then it should be empty, and Firefox should just wait until it gets a new message. Which is what was happening, and explained why it woke up with mouse movement. But, why weren't we waking up for the timeout?

Firefox waits for a message by calling the windows function WaitMessage. That puts our thread to sleep until a message appears in our queue. We were frozen here, waiting for a message to come in. Unfortunately there are other ways to check the queue. You can call PeekMessage which reads a message from the queue and either pops it off or leaves it in place. The sad thing is that if it leaves it in place, it still counts as 'read'. The contract for WaitMessage is a bit subtle - if there is anything in the queue that is unread, it returns immediately. If the queue is empty or (and this is the nasty bit) contains only read messages, it sleeps until another message comes in. So, you must call PeekMessage just before you call WaitMessage to ensure that the queue is empty.

The bug here was that another thread (the compositor thread) was checking PeekMessage. We couldn't avoid doing this, the call comes from deep in the thread management code. We freeze when, in our main thread message loop, we check that the queue is empty and it is, then we context switched to the compositor thread which also checks the message queue, but at some point our wake up call has arrived. That gets checked by the compositor thread's event loop and is ignored and left on the queue in the 'read' state. When we context switch back to the main thread, we call WaitMessage and there are no messages so we sleep forever. It might seem unlikely that a message arrives and is checked between the queue being checked and the WaitMessage call. Calling PeekMessage doesn't just tell you if any messages have been posted to the queue, it handles any sent messages (Windows has posted (async) and sent (sync) messages). That means that during the PeekMessage call a whole lot of unknown stuff can happen. I'm a bit hazy about exactly how this bit worked out, possibly we got a paint message and that caused us to do a sync composite which context switched to the compositor thread which checked its message loop and then PeekMessaged the wake up call. In any case, this happened frequently enough to cause intermittent failure. It was a Heisenbug because by adding debugger overhead we got the wake up message entirely after the composite (I'm guessing here, btw) so I saw the bug less often or not at all.

The fix was also kind of interesting, but in a knowing the details of the Windows APIs, rather than solving a mystery kind of way. It turns out you can call WaitForMultipleObjectsEx in such a way as to return even if there are only read messages in the queue, and then sidestep the whole business.

Tip of the hat to roc and bsmedberg who helped a lot with the diagnosis and cure of this problem. I would have been stuck without their Windows knowledge and Firefox-workings-insight.

Saturday, September 07, 2013

An ode to Sublime Text

I use Sublime Text as my editor for about 80% of the work I do (sometimes on Windows I use Visual Studio because I use it for debugging and the integration with the debugger is convenient. Even so, I often use Sublime Text even then because it is so much nicer). Sublime Text is pretty much a perfect piece of software. It is not often I get that warm glow from using software that does exactly what I want it to, but Sublime is one time. Here are some good things about it:
  • It does one thing and it does it well (editing text for programmers).
  • It is super fast and super stable - I've never had it jank (actually when searching through an 80MB file, but that is understandable) and never had it crash.
  • It is beautiful - seriously, it looks really nice. That is important to me if I am going to spend most of my day looking at it.
  • It is multi-platform (but not, unfortunately, open source (so not quite perfect) so I can't use it on Linux/ARM).
  • It is extremely customisable. I have not found anything I want to change and can't.
  • It is easy to set up your customisations across platforms because they are stored in a plain text file.
  • It is smart (fuzzy search, etc.).
  • It has keyboard shortcuts for everything and they are all the ones you would expect (except maybe ctrl+t which I often hit trying to open a new tab, but hey, that is customisable too).
  • It's extensible with plugins.
Notice that none of the above are about editing text. Editing text is really nice too. But Sublime Text does all the 'meta' level things perfectly. That is necessary (but not sufficient) for a great piece of software. Of course you have to get the primary function right (and Sublime Text does), but so many pieces of software do their core function nicely but fail on the meta-level stuff.

Anyhow, if you haven't tried Sublime Text, you should. And if you are making software, you should strive to make it as nice as Sublime Text is.

Thursday, August 22, 2013

Things I like vs things that make me work better.

One thing about working remotely is you get more opportunity to customise your work environment. This has taught me that things that I like or make me feel better don't necessarily make me work better. This should not come as a surprise, really, but it has.

Two examples: I always thought of myself as a morning person - I like getting up early and not staying up late, and I feel good when I wake up early. But I realised I am much more productive later in the day. It doesn't seem to be the number of  hours after I wake up, it is about actually being later in the day. So I work better if I get up later and work later. Unfortunately I don't often get to do that, but I do when I can.

Next, I prefer my work space to be light and bright. But I actually work better in a pretty dim environment. I have to take regular breaks to get some natural light though, otherwise I just feel like I'm turning into a troll.

Just some random thoughts.

Friday, June 28, 2013

Integrating Mercurial queues and Dropbox

We use Mercurial to manage the Mozilla source tree. Locally I use the Mercurial Queues (mq) extension to manage my patch queue. Mq keeps a stack of patches and provides commands (qnew, qpush, etc.) for applying, updating, and managing patches. The patches are kept in a 'patches' subdirectory in the project's .hg directory. Also in that directory are two files of meta-data - 'status' and 'series'. The former is binary data and keeps track of which patches are currently applied. The latter is text and contains an ordered list of your patches. By fiddling with the 'series' file you can manually add, remove, and re-order patches in a queue.

I have several real and virtual machines with multiple copies of the tree on each (not always the same tree - moz-central vs inbound, for example). Keeping the patches in sync is a pain. I have to do this when testing on different platforms and when landing patches (moving from my dev tree to my clean clone of inbound). One solution is creating a user repo in the patch queue. You can then manage your queue as another repo. This is nice because you get history and can wind back mistakes, as well as backup and synchronisation. However, it is a bit cumbersome. I would prefer something more convenient for syncing between repos on the same and different machines. So, I thought what if I could use Dropbox to sync the patch queues? After all for keeping files in sync, Dropbox is as convenient as it gets. It turns out to be not that easy because Dropbox will not sync files outside of the Dropbox directory and hg will not allow you to use any directory other than '.hg/patches' for your patch queue. Furthermore you can't just use a symlink because you really don't want to synchronise the 'status' file, only the patches (and Dropbox will not exclude files). (Sometimes I want to synchronise the 'series' file, and sometimes not).

My solution is to patch the mq extension to allow specifying the path to patches and the meta-data files (yay for open source!). I can then set up symlinks from the subdirectories to my Dropbox directory and we're good to go. The path is to patches is fairly well hard coded into the extension, I couldn't find a way to refer to a completely different directory, which would mean not needing the symlinks. The syncing between repos is not quite real-time, it only happens when I qrefresh, but that's as good as it can get, really. I have to be a bit careful to keep the repos at similar ages so that the patches cleanly apply. Which patches are actually applied is also not synced, so I have to manually track that. And even if a patch is synced, it is not automatically re-applied to the tree. I have to do a manual qpop/qpush. Still, I think this is a real improvement on my workflow. It also means it is easy for people to see what I'm working on in almost real-time, which is pretty cool.

If you use qqueue to manage multiple patch queues, this technique should work cleanly with that. Each queue will need a subdirectory for each named path you specify. Each can be symlinked to a different directory in Dropbox or wherever.

Instructions

If you want to set up something like this yourself, you will need to:

grab the mercurial source
grab my patches
apply them
build mercurial
set up symlinks
configure mq

Start by cloning the hg repo (I guess we could use a source bundle, but we may as well get the bleeding edge). The easiest way to apply my patches is using a queue, so qinit in the new hg directory. Download and extract the patches and series file and un-tar into the hg patches directory. hg qpush -a.

Then we need to build mercurial. On Linux this was easy, I needed to install the python-dev package and then just run 'make local' in the hg directory (I'm not going to install the new hg as the default one in case something breaks). Then pop an alias to the new hg in your .bashrc.

Building and installing in Windows was much more of a pain, I think in part because I have a load of different versions of Mercurial installed in different places. Also I work half in real windows and half in the moz-build flavour of MinGW. Anyway, the following worked. More information here and here.

After cloning the hg repo, I installed a fresh version of MinGW (I had Python, and I guess its headers). I had to explicitly add '/c/MinGW/bin' or whatever to my PATH (export PATH... in MinGW, not the system-wide Windows version). Then I ran 'python setup.py build --force -c mingw32' from the MinGW console (this failed from the windows command prompt, even after playing with the path environment variable). I then manually copied all the .pyd files from 'hg\build\lib.win32-2.7\mercurial' to 'hg\mercurial' (where 'hg' is where I cloned hg to). Then I created an alias to the new version of hg similarly to on Linux.

Next the symlinks. Actually, it turns out you can't use symlinks (on Linux ), because hg throws a hissy fit because "security". But you can use mount - 'mount -o bind $target $link' where target is inside dropbox and link is inside .hg/patches. You can put the mount in your /etc/fstab to re-mount on startup.

On Windows use 'mklink /J $link &target'. Note the order of link and target is reversed (compared to mount on Linux) and that you will have to use '\' not '/'. This only works in the windows command propt, not the MinGW console.

Finally you need to configure mq. You do this by editing the hgrc file in the projects '.hg' directory (I guess you could use your ~/.hgrc file (Mercurial.ini on Windows) if you want to use the same sub-paths for all your repos, but I didn't). Add an mq section, you can then set 'seriespath', 'statuspath', 'patchespath', and 'guardspath' to add a sub-path for the various kinds of files. Any kind you don't specify a path for gets put in the usual place (.hg/patches). For example,

[mq]
seriespath = shared
patchespath = shared

will result in your patches and series file ending up in '.hg/patches/shared' and your status file in '.hg/patches'. (This is what I do for most of my repos, for some I don't specify seriespath so I can do different things there).

Note: sometimes on Windows (never Linux, which is why I haven't fixed this) I see an error message when doing 'hg qnew'. But the patch gets created just fine, so it seems to be fine to ignore this.

Finally, if you would like to see what I am working on, you can find my patches here.

Thursday, June 20, 2013

Trychooser hg plugin

If anyone is building their own hg from the current source and uses the trychooser extension (https://github.com/pbiggar/trychooser), you'll find it doesn't work. Here is a hacked version that does: trychooser. Since it is not backwards compatible with older versions of hg, I guess we don't want the changes committed for now.

Saturday, April 13, 2013

The layers refactoring has landed!

I'm happy to report that the layers refactoring has landed on Mozilla central and is now in Nightly. We have already fixed a bunch of bugs (WebGL on b2g, plugins on Android, b2g tests, fixed position layers, ...) and are working on more. But, nothing seems insurmountable and it looks like the refactoring will stay landed.

We've tried to document the new system in the classes. The best place to start is gfx/layers/Compositor.h. To give an overview of the changes, there is now only one kind of layer on the compositor thread - composite layers, and one layer manager. These use a compositor interface to actually do the compositing, and there are (or will be) multiple compositor backends (see gfx/layers/opengl/CompositorOGL.h, for example). To implement a new OMTC backend, you should only have to implement a Compositor and one or more TextureHost, possibly also a TextureClient. There are other changes to how basic layers on the content thread interact with layers on the compositor thread. If it is not clear from the docs what is going on, please let me (nrc on irc) or nical know and we'll try to improve the docs!

We would love some help getting this tested! The easiest way is to simply use Firefox nightly for Android. If it crashes we should see it in the crash reports. If you notice anything rendering incorrectly, please either file a bug or let us know via irc (or even leave a comment here). You don't need to set any prefs etc. for Android, you will automatically get the refactored version.

We are most worried about FirefoxOS/b2g because that is where we use the most esoteric code paths and have the least automated testing. If you are working on b2g and are able to help us by running b2g built with m-c rather than one of the b2g branches, that would be great. Again, no prefs necessary, just using m-c is enough.

If you are on Linux and Mac and are feeling brave, then you can help us by running Firefox with OMTC on these platforms. Please bear in mind that this setup is unsupported for now and is known to be buggy and missing some features (plugins, for example). The most useful thing would be to compare Aurora (no refactoring) to Nightly (with refactoring) and let us know if anything has got worse. For both platforms you must set the pref "layers.offmainthreadcomposition.enabled" to true (in about:config). For Linux, you must be able to run normally with OpenGL with no issues, to do this you need to set "layers.acceleration.force-enabled" to true. If you have not tried this before, I would try it before trying OMTC, there is lots of driver sadness around. For Linux you must also set the environment variable "MOZ_USE_OMTC=1". Note that when using OMTC, about:support will report the layers backend for the content thread only, i.e., it will not really be true, it will appear that you do not have HWA, when you do.

Finding bugs for us to fix is much appreciated!

Finally thank you to everyone working on mozilla-central for your patience whilst we fix(ed) bugs, we appreciate it! And thank you to all the graphic team for their help getting this planned, written, finished, reviewed, and landed.

Thursday, March 28, 2013

Layers refactoring update

We missed our target landing date of 18th March. But, other than being a few weeks late, things are progressing nicely, in fact we are nearly done. All our tests pass and we are starting to get reviews of the code. Although we still have a couple of bugs to clear up, I think we are in good shape. To land we just need to get all our reviews and address any comments that arise (and fix those bugs, obviously). We hope that won't take too long and we now aim to land on mozilla-central as soon after the next uplift (2nd April) as possible. All going well, that means the layers refactoring will be in Firefox version 23. Once we land on mozilla-central it would be great to have lots of people test this, I'll blog about how to do that once we land.

You can keep an eye on our tests on tbpl and our reviews and responses on bugzilla.

Saturday, March 23, 2013

Firefox on Raspberry Pi

Some time ago I acquired a Raspberry Pi. These little computers are awesome, it is amazing that a full blown computer can be had in a tiny little package for so little money. The possibilities for tinkering geeks and for education are endless. Of course, the first thing I wanted to do was to get Firefox running on the thing. I also wanted to be able to build Firefox for it so that I can hack on the graphics support for it. And understand the process so that when an army of volunteers show up wanting to hack on Firefox for Raspberry Pi (which I hope they do), I'll be able to help. (For a variety of pretty sad reasons, we can't support accelerated graphics in a supported configuration on the Raspberry Pi. That would be massive boost to performance. I'm sure there are a lot of other, interesting things we could do too. Most of them are gated on hardware acceleration though.)

Anyway, the Raspberry Pi is way too underpowered to actually compile Firefox on. So that means cross compiling Firefox for the Raspberry Pi (ARM6,Raspbian) on my PC (X64,Ubuntu). And then I spent three months (seriously) in an entire world of pain. I am really not a Linux whiz, and I've never cross-compiled anything before, and I am not that familiar with Firefox's build system, so there was a lot to learn and a lot painful ways to screw up. Not least of which was that I ended up upgrading Ubuntu in the middle of this, and after that I could no longer debootstrap wheezy. So, if you are running Ubuntu 12.04, DO NOT upgrade to 12.10!

Oleg Romashin (romaxa) has excellent instructions here for doing this. A lot of my pain came from not following these to the letter. He also helped me out so many times on this journey that it was embarrassing, so big thanks to romaxa! Any credit for getting this working at all goes to him, I just blundered through it and hope I can help others by sharing my experience.

Anyway, the overall plan is to build a crosstool-ng toolchain, setup a chroot, install Raspbian into the chroot, use Scratchbox2 to manage the whole thing, and finally use Scratchbox2 to build Firefox for an ARM6 target.

Glossary

If you know all about this stuff, please skip this section. I had to look up most of these terms, so hopefully it will be helpful to some.

Cross-compile - to build a piece of software on one platform which will run on another platform. The target system is the one we will run the software on; the host system is the one we will build the software on.
chroot - chroot changes the root of the file system to a new directory for all programs executed inside the chroot ('inside' means run using the chroot program, not inside in terms of the directory structure). This allows the files and programs in the chroot to see a different set of files/programs/settings from the rest of the system. From inside the chroot, programs cannot see outside to the rest of the real file system.
toolchain - a bunch of tools and libraries used to compile a program. Compiling a program for a different target will require a different toolchain.
Crosstool-ng - a toolchain builder. You enter the configuration settings and crosstool-ng gives you a complete toolchain which you can then use to compile your software.
Scratchbox2 - a tool for making cross compiling easier. Scratchbox2 provides a virtual environment so configure and the like think they are in the target environment when they are executing on the host.
Raspbian - a version of the Debian Linux distro tailored for the Raspberry Pi.
Wheezy - a version of Debian/Raspbian. For some reason Linux distros use weird names instead of (or as well as) version numbers. I'm not sure why anyone would choose 'wheezy' it does not exactly have connotations of speed and reliability. But I guess this is what you get when engineers choose names instead of marketing people.
debootstrap - tool for installing Debian into an existing OS/file system.
Linaro - an organisation which produces open source software for ARM systems. In the context of cross-compiling for ARM, Linaro usually refers to the Linaro compiler, a version of GCC specifically targeting ARM.

Building Crosstool-ng

Basically, follow the instructions at http://www.bootc.net/archives/2012/05/26/how-to-build-a-cross-compiler-for-your-raspberry-pi/. But, don't download the tar ball, clone the repo from http://crosstool-ng.org/hg/crosstool-ng and build according to the instructions here (http://crosstool-ng.org/#using_the_latest_development_stuff).

When you come to running menu config (ct-ng menuconfig), you should add support for C++ and use the latest versions of everything.

Follow the instructions on the wiki page for setting up your chroot and installing Scratchbox2. The wiki suggests putting Raspbian and Crosstool-ng in separate directories, but this did not work for me - I get errors when building Firefox. Specifically, __off_t and __pid_t being undefined types. The fix is to install Raspbian into $PATH_TO_CROSSTOOLS/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sysroot. My CHROOTPATH variable is then $PATH_TO_CROSSTOOLS/x-tools/arm-unknown-linux-gnueabi/arm-unknown-linux-gnueabi/sysroot/wheezy-hf. (As an aside, the hf stands for hardware floating point, and means we are targeting ARM processors with hardware floating point capability (such as the Raspberry Pi). Some ARM chips do not have hardware support for floating point (-sf) and we would have to use software floating point routines. That requires different build targets for all the libraries etc.).

Install the necessary packages

Next, update any packages already installed with sudo chroot $CHROOTPATH apt-get update. Then install the packages from the wiki page (you can probably do without the Qt packages if you will use Gtk, like I did). I had to install additional packages. I'm not sure that all of these are essential, in fact I'm sure some aren't, but I'm not exactly sure which. In any case its only a few MB. I installed: binutils-dev, libc-dev, and the mozilla build prereqs from this wiki page, which are currently: mercurial g++ make autoconf2.13 yasm libgtk2.0-dev libglib2.0-dev libdbus-1-dev libdbus-glib-1-dev libasound2-dev libcurl4-openssl-dev libiw-dev libxt-dev mesa-common-dev. Some of them should already be installed (make, at least) and some you won't need (mercurial - because you already have the repo outside the chroot, g++ - because we have a compiler installed by crosstool-ng, probably others).

You will need a custom mozconfig file. There is one on the wiki page, I used a different one, which you can find here. My mozconfig will give you a version of Firefox which is closer to the versions we distribute for Linux, but not as performant as the version using the mozconfig from the wiki.

Then build Firefox with sb2 MOZCONFIG=$PATH_TO_MOZCONFIG make -f client.mk and make a tarball of the distributable using sb2 MOZCONFIG=$PATH_TO_MOZCONFIG make package. You can then post it over to your Raspberry Pi using scp, a usb stick, or whatever. Extract using tar -xjf $FILENAME which will create a firefox directory. Run using ./firefox in that directory. Then luxuriate in your Firefox on Raspberry Pi experience! (Warning - it will be pretty slow.)

An alternative configuration

At the end of all this, you'll get a version of Firefox which is as close as possible to that on other platforms. But unfortunately it does not support hardware acceleration. Alternatively, you can use the mozconfig on the wiki page which will give you a version of Firefox which uses Qt rather GTK and EGL rather than GLX. That is not a supported configuration, but will give you hardware acceleration, which in turn allows for using OMTC and tiling, which might be enabled (I haven't tested, it looks like it might need a bit of fiddling with settings and possibly environment variables, but it might work out of the box).

Friday, March 22, 2013

Finding instructions generated from JS

This is a bit of a beginner's tip for hacking on the JS engine (because I have been hacking a tiny bit on the JS engine (specifically the ARM assembler), and I am definitely a beginner).

If you have just written some code generating code, you probably want to see what code it actually generates. I found this not as easy as I expected.

The plan is to execute the generated code, break as we execute it (just after in my case, though I imagine just before is often more useful), then use gdb to see which instructions are under the program counter.

To break in the generated code, I just called |MacroAssemblerARMCompat::breakpoint()| in the code generation code, which, when we generate code, inserts a breakpoint instruction (and does some other fanciness too, but we don't need that for now).

I could not come up with a minimal test case in the JS shell which hit the breakpoint. So I had to try and find a test that did. (As an aside just because the VM generates code, does not mean that it will run it, I did not realise that). I ran
./jit_test.py -f $PATH_TO_JS_SHELL
which runs all the tests and gives a command to run the failing ones. Hitting that breakpoint causes a segfault, and so any test that exercises it will fail.

The output commands look like
[objdir]/js -f [srcdir]/js/src/jit-test/lib/prolog.js -e "const platform='linux2'; const libdir='[srcdir]/js/src/jit-test/lib/'; const scriptdir='[srcdir]/js/src/jit-test/tests/v8-v5/'" -f [srcdir]/js/src/jit-test/tests/v8-v5/check-raytrace.js
You can then run gdb with the js shell (gdb ./js, assuming you are in the objdir) and start execution with
r -f [srcdir]/js/src/jit-test/lib/prolog.js -e "const platform='linux2'; const libdir='[srcdir]/js/src/jit-test/lib/'; const scriptdir='[srcdir]/js/src/jit-test/tests/v8-v5/'" -f [srcdir]/js/src/jit-test/tests/v8-v5/check-raytrace.js
(which are the arguments from the command above). Execution will quickly stop when you hit the breakpoint. At this point you can use a gdb command like
x /10i $pc-36
to give you the 10 instructions up to and including the one pointed to by the pc. You can adjust the 10 and 36 to get the required number of instructions. This will give output something like
   ...
   0x766a0c30:    sub    sp, sp, #20
   0x766a0c34:    stm    sp, {r0, r1, r2, r3, r4}
   0x766a0c38:    vpush    {d5}
   0x766a0c3c:    vpush    {d2-d3}
   0x766a0c40:    vpush    {d0}
=> 0x766a0c44:    bkpt    0x000b
The Mozilla pages on hacking JS and Javascript tests were very useful along the way. Thanks to Marty Rosenberg and Nicolas Pierron for helping me along my way.

Thursday, March 21, 2013

Stupid British politicians and their stupid education policies

Urgh:
On Thursday Sir Michael Wilshaw, the chief inspector of England's schools, waded into the row, ordering the academics to get "out of their ivory towers". Pupils needed to learn some basic facts by heart, especially in maths and English, he said.
From The Guardian.

This makes me so angry! First the 'ivory towers' thing is just a cheap dig and a populist way to attack people who have spent their (in this case long and decorated) careers in education. And are probably some of the most educated people in the country. Perhaps they know something about education and you should listen?

But what really gets me is "...learn some basic facts by heart, especially in maths...". This sums up all that is wrong with the British (western?) attitude to maths. Learning maths by heart is not learning maths at all. If we taught maths properly then we might end up with some better (and more) science and technology students.

Saturday, March 16, 2013

How I learnt to stop worrying and love open source

I've always been a fan of open source, basically because who doesn't like free stuff? But I've never really seen the greatness that people get so excited about. Two reasons for this are that I am put off by some of the more fanatical elements of the community (I realise that this is due to a vocal minority, by the way). Second, people need to make a living, and giving away your product seems like a tough business model. Of course it can work, Mozilla being an excellent example and there are many others. But there is no simple model along the lines of 'make something, sell it to people, ..., profit' which can be applied to open source software in general. Maybe that is not a bad thing, but it has stopped me fully embracing the idea of open source as a software engineering solution.

Open source has many, many advantages. After working for Mozilla for a year, I almost can't imagine how it is possible to work on a closed source project. Having the involvement of a wider community, being able to search the web for our code, our bugs, documentation, blogs giving insight into the code, not worrying about secrets, and so forth are truly wonderful. Contributing to an open source project is also the best way to learn about software engineering and any specific domain of it. If you are a student, or are looking for work, or looking to improve your software skills in any way, then there is absolutely nothing better you can do for yourself than to find an open source project and get stuck in (plug time - anyone interested in graphics in web browsers should get in touch!). It is probably my biggest regret about university that I didn't get involved with some open source projects, and instead worked on my own projects.

Anyway, all this is in the past. As of the last few weeks I LOVE open source and I am now truly a believer. The reason is that I acquired some new hardware, in particular a Raspberry Pi and a Samsung Chromebook. Both of these have ARM processors, which although found in pretty much every phone and tablet, are pretty much a minority interest in terms of 'real' computing. First off I have been amazed at the quantity and quality of open source software specifically aimed at such devices. There are no closed source equivalents and, due to economics I suppose, there never will be for niche areas like this.

Secondly, and here is the amazing bit, where the software I want for the platform I want doesn't exist, I can just compile it! This is so simple, yet so powerful. Even for niche areas you can usually get things like an OS and a browser, but what about all the other bits and pieces you want? For example, I use Sublime Text 2 as my main editor, it is a lovely piece of software and I use it in Windows, Linux, and on Mac. But, it is not open source and there are no binaries for Linux/ARM, so I cannot use it. But SciTE is also a lovely text editor and IS open source, so I can just compile it and use it anywhere I want. That is amazing! No really, we take this a bit for granted, but in terms of encouraging innovation, open source is miles ahead due to this very simple fact.

Tuesday, February 19, 2013

OMTC Questions

mayankleoboy1 asked some questions in the comments of another blog post and I figured the answers might be of interest to a wider audience, so here they are (edited for order because it makes them easier to answer):

So when is the expected date for GFX and the m-c trees to merge ?

Around March 18th, as long as there are no unexpected problems. This is our goal date, not a promise :-)

[...] a lot of the OMT* is being done on priority for FFOS and Android, and later trickling to desktops. Has the traditional desktpos (win, lin and mac) market become second tier platform for mozilla ?

No, certainly not, although we have a lot of work to do on mobile, so that is a focus for many engineers right now. OMT* is developed for mobile because it is needed most there. Without OMTC, Firefox for Android is really unusably bad. Without OMTA FirefoxOS is really slow and jittery in some key places. Desktop Firefox works pretty well without them (although it will work better with).

Why is that OMT* work lags on windows, compared to OSX and Linux ? AFAIK, windows makes 90% of mozilla users. So shouldnt windows desktop get more priority ?

As I said above, the focus of the OMT* work has been for mobile and that means OpenGL. We don't support OpenGL on Windows, so we don't have OMT* there. We only have it on Mac and Linux to make mobile development easier - it is not yet a supported configuration on either platform (although it will be in the future). Implementing OMT* on a different graphics backend has been very daunting. One goal of the layers refactoring is to make that easier. Our current focus for OMTC is Windows, in particular for the Metro browser. Unless there are unforeseen hurdles, Windows will be the next platform to get OMTC. (OMTA has a few other issues before it can be used anywhere other than FirefoxOS (including on Android), not least of which is testing).

And yes, Windows is a higher priority for Mozilla (in general) than Linux and Mac, although user share is not the sole determinant of priority (Linux gets a lot of love (relative to its user share) because it is more closely aligned to our mission and a lot of developers use it, for example).

Thursday, February 07, 2013

Skia canvas on Windows XP

Using Skia as the rendering backend for canvas has been an option for a while now. Skia is now the default for Windows XP users. That will filter out to nightlies today or possibly tomorrow. It should make canvas perform a bit better on XP.

At the moment our benchmarking does not make a solid case for making it the default on other platforms. If you are not on XP and would like to experiment (possibly exposing yourself to 'fun' bugs) you can use Skia by setting the pref gfx.canvas.azure.backends to 'skia'.

Thanks to Rik Cabanier, Matt Woodrow, Jet, and George Wright for getting this done.

Tuesday, February 05, 2013

A fun bug

(Actually this is a two for one kind of a deal)

I've spent the last two days finding two tricky bugs in my port of tiled Thebes layers to the async compositing API. I think they are kind of fun, so I'll try and describe them here. I'll try to elide the details a bit. If you want to check out the real code, look at ContentClient.cpp, ContentHost.cpp, and BasicTiledThebesLayer.cpp on the graphics branch.

First, the old way. A tile buffer keeps a bunch of tiles (the actual tiles, not references, that is important) and each tile keeps a reference to a gfxReusableSurfaceWrapper. A gfxReusableSurfaceWrapper is kind of neat, it keeps a reference to a surface and can be locked for reading. When we want to write to it we ask for its surface. If it is locked, then you get a fresh surface (with a new gfxReusableSurfaceWrapper to wrap it). If it is not locked, you get the same surface as last time.

To render the tiled layer, the content thread gets a surface for each tile and paints to it. When the tiled layer is rendered, a copy of the tile buffer is made in the heap and a reference is passed to the compositor thread. The compositor thread locks all of the gfxReusableSurfaceWrappers (via the tiles and buffers) and blits them to the screen.

Note that if the gfxReusableSurfaceWrapper is locked and we get a new surface when painting, then we store the new gfxReusableSurfaceWrapper in the tile and lose track of the old gfxReusableSurfaceWrapper. Also, gfxReusableSurfaceWrappers are reference counted. They are destroyed when there are no more references to them. Finally, it is very important that when a gfxReusableSurfaceWrapper is destroyed it is not locked for reading; we assert that.

This sounds fun already, right? But the fun bit is still to come...

As far as we are concerned, the main effect of refactoring into the new compositing API is that we add another layer between the tiles and the gfxReusableSurfaceWrappers. We add a TextureClient. The tile holds a reference to the TextureClient and the TextureClient holds a reference to the gfxReusableSurfaceWrapper. The TextureClient lives on the heap and is also reference counted.

What could go wrong?

What goes wrong is that we trigger an assertion by trying to destroy a locked gfxReusableSurfaceWrapper. Figuring out why took me a little while. What should happen is that the copy of the buffer and  its tiles on the compositor thread keeps the gfxReusableSurfaceWrappers alive once the tiles on the content tread forget about them. That works because we only lock the tiles for reading when we pass them to the compositor and because when we copy the buffer (a bitwise copy) we copy all the tiles, creating another reference to each gfxReusableSurfaceWrapper. But, with the TextureClients, the tiles are copied and we add another reference to the TextureClients, but they are not copied and so we only have one reference to the gfxReusableSurfaceWrappers. Thus, the next time around if we get new gfxReusableSurfaceWrappers and forget about the old ones, then they are destroyed, even though they are locked by the compositor! The fix is to do a 'deep' copy, copying the TextureClients rather than making another reference to them.

What could go wrong?

This gives rise to the really fun bug. Because if you do the 'deep' copy on the Compositor thread, you still hit the same assertions, just much less often. What is happening here is that there is a gap between when the tiles are locked (content thread) and when we make a copy (compositor thread). Sometimes we might get to repaint (content) before we composite the previous round and that means we un-reference the gfxReusableSurfaceWrappers after we lock and before we copy. That took a while to find, but in retrospect doing the 'deep' copy on the compositor thread was dumb, I'm not sure why I did that. The fix is easy, just move the deep copy to the content thread.

Monday, February 04, 2013

Throttling off main thread animations

For the last few months I have been working mostly on throttling off main thread animations (OMTA), in between a little of the layers refactoring, which I'm now returning to. Under OMTA, CSS animations and transitions are animated on the compositor thread. That makes things run faster (because the main thread is free to do other work) and smoother (because if the main thread gets bogged down in some work, the compositor thread can carry on animating smoothly). Much of the work for OMTA was done by one of our awesome interns, David Zbarsky.

The old way of doing CSS animations (and the way we still do things for most properties) is for layout to do all the work. Every frame of the animation the necessary parts of the webpage are laid out (the process of converting HTML to graphical objects) and rendered (converting those graphical objects to pixels) afresh with the correctly interpolated property value. If we have off main thread composition (where each layer is rendered on the main thread, but layers are composited together on a separate thread) then we can instead layout the web page once and change the way we composite to take account of the animation. The initial implementation did this in such a way that the main thread still does a layout run for each frame, to keep its model up to date and the compositor did it's own animations too. That got the smoothness but not the speed-up. In fact, since we did the interpolating twice, presumably it slowed things down slightly. My task was to finish off the work to stop animating on the main thread (bug 780692). That is the 'throttling' bit. It has been surprisingly difficult; easily the hardest and most frustrating problem I have worked on at Mozilla. But also lots of fun.

The main difficulty is that we do sometimes need to 'catch up' on the main thread, mostly when we need to respond to some JS/DOM stuff. For example, if we have to test whether the mouse cursor is over an element with an animated scale, we need the current value of that scale to be able to tell whether the cursor is inside that element. That means that layout, which runs on the main thread, needs to have an accurate picture of the state of the animation. We call this update of layout a mini-flush. We do a mini-flush periodically (every 200ms at the moment) and when we need to have accurate information for DOM stuff. What happens during a mini-flush is that we calculate the animated values for that moment in time and post them to the compositor. It gets tricky because we want to avoid doing a full (and very expensive) re-layout of everything and only update the animating property of the animated element. It gets even trickier because it might have been a restyle which requires the animation data and we cannot start a new restyle pass while one is already in progress.

I have skipped *a lot* of the details here. There is a lot of interesting discussion in bug 780692 if you are interested in this stuff.

Currently OMTA is only used on Firefox OS. There is work in progress to port it to Firefox on Android, and that shouldn't be too hard. It should work fine on desktop (that is where I did most of the development work), but requires OMTC (which in turn, currently, requires hardware acceleration), which is a little way of for all platforms. Once we have that, OMTA should be good to go.

If you are writing a webpage, there is no way to guarantee you'll get OMTA. But you have a good chance if you use CSS animations or transitions to animate either the transform or opacity, and don't have a 3D transform on that element. For example, most of the 'windowing' animations on Firefox OS (window opening animation, window changing animation, etc.) get OMTA.

Saturday, February 02, 2013

Layers refactoring update

I got taken off the layers refactoring last year to work on off-main thread animations for Firefox OS (more on that in another post). In the meantime Bas and Nical have been carrying on the refactoring work. As of a few weeks ago I am back on the refactoring. And so is most of the graphics team in some capacity. It has become a high priority for us because it blocks OMTC on Windows, which blocks our Windows Metro browser. I have been converting tiled layers to the refactored setup (more below). Bas has got OMTC going on Windows using Direct3D 11, still early days though. There have been some small architectural changes (also below), and work carries on. We're getting there, we hope to merge the graphics branch (where you can follow our progress and contribute, beware builds can be very shaky) to Mozilla Central around the end of February.

On the architectural side, there are two major changes: textures communicate via their own IPDL protocol, and textures can be changed dynamically. There has also been some renaming - what used to be called BufferHost/Client are now called CompositableHost/Client. Many of the flavours of Texture* and Compositable* have changed as we go for cleaner abstractions rather than trying to closely match existing code.

Textures (and soon Compositables) communicate directly with one another using their own IPDL protocols, rather than using the Layers protocol. Communication mostly still occurs within the Layers transactions, so we avoid any race conditions. The advantage of separate protocols is that each abstraction layer is more fully isolated - the layers don't know what their textures are doing and so forth.

It is a requirement that Textures can be changed dynamically. This is a shame. It would be nice (and sensible) if once we create a layer its Textures remain of the same type, unless the layer changes them. But this is not the case, for example, async video can change from RGB to YCbCr after the first frame without the layer knowing. So, we have to deal with the texture changing under us (i.e., the Layer and Compositable), which since the Textures use their own communication mechanism is complicated. This has lead to a lot of code churn, but hopefully we have a solution now. It will be an interesting challenge for our test frameworks to see if they pick up all the bugs!

Personally, I have been concentrating on tiled layers (and tidying up a whole bunch of TODOs and little things). Tiled layers are Thebes layers which are broken up into a grid of tiles. We use tiled layers on Android, but have long term plans to use them pretty much everywhere. Each tile is a texture and they are managed by a TiledBuffer which is held by the layer. There is thus an obvious mapping to the refactored layers system. Unfortunately that didn't work so well. Perhaps in the long term we can end up with something like that. For now, the Compositable owns a TiledBuffer which manages tiles which hold Textures. This is because the buffer is copied from the rendering to compositing threads, and tiles are required to be lightweight value objects, but Textures are ref counted heap objects. Once we have an initial landing of the refactoring, we can hopefully change the tiled layers architecture to match the refactoring vision and we'll be sweet (which will allow tiled layers to work cross-process too, currently they only work cross-thread).

Tuesday, January 22, 2013

Urgh, movies in the internet era

This has been said many times by many others, mostly more eloquent then I will. But, it is annoying me and I want to rant. Buying or renting movies sucks! I want to rent (or buy) a movie to watch (take my money!) but it is so difficult. Especially if I don't want some obsolete piece of plastic to store it on.

It should be easy: I go online, I find a movie I want to watch, I pay some money, I watch the movie. Possibly with a downloading step in there too. I won't even object if the movie deletes itself after I watch it if the price is reasonable. This pretty much works for music via ITunes (only ITunes as far as I can see. This pains me because I hate Apple and everything they stand for and I would prefer not to give them money. But Amazon (for example) does not work in NZ).

I have pretty much given up on legally getting a movie to watch on my computer in NZ via a wire. I had high hopes of watching films on my Nexus 7. In the UK this worked pretty well, I used the Play store selected a movie and tried to pay. My NZ card wouldn't work. Luckily I have a UK one, and that was OK. I watched the movie, it was good.

Now I am in Canada. I would like to rent/buy another movie. I go to the Play store and select a movie, but now I can't use either card. I don't have a Canadian card, so I can't rent a movie in Canada - WTF?! I can't connect to the UK Play store because it selects the country based on (I presume) IP address. And none of this works at all in NZ. It is all ridiculous, the business model does not account for people who leave their home country or live outside a few select countries. It is all so sad and stupid. And then people wonder why their is movie piracy. The industry deserves to die for this. It makes me so angry because the solution is so easy, and yet it is avoided in the short-sighted belief that more money can be made.