mupuf.org // we are octopimupuf.org

Setting Up a CI System Part 1: Preparing Your Test Machines

Un­der con­tract­ing work for Valve Cor­po­ra­tion, I have been work­ing with Char­lie Turner and An­dres Gomez from Igalia to de­velop a CI test farm for dri­ver test­ing (most graph­ics).

This is now the fifth CI sys­tem I have worked with / on, and I am grow­ing tired of not be­ing able to re-use com­po­nents from the pre­vi­ous sys­tems due to how deeply-in­te­grated its com­po­nents are, and how im­ple­men­ta­tion de­tails per­me­ate from one com­po­nent to an­other. Ad­di­tion­ally, such de­signs limit the abil­ity of the sys­tem to grow, as up­dat­ing a com­po­nent would im­pact a lot of com­po­nents, mak­ing it dif­fi­cult or even im­pos­si­ble to do with­out a rewrite of the sys­tem, or tak­ing the sys­tem down for mul­ti­ple hours.

With this new sys­tem, I am putting em­pha­sis on de­sign­ing good in­ter­faces be­tween com­po­nents in or­der to cre­ate an open source tool­box that CI sys­tems can re-use freely and tai­lor to their needs, while not paint­ing them­selves in a cor­ner.

I aim to blog about all the dif­fer­ent com­po­nents/in­ter­faces we will be mak­ing for this test sys­tem, but in this ar­ti­cle, I would like to start with the ba­sics: propos­ing de­sign goals, and set­ting up a ma­chine to be con­trol­lable re­motely by a test sys­tem.

Over­all de­sign prin­ci­ples

When de­sign­ing a test sys­tem, it is im­por­tant to keep in mind that test re­sults need to be:

  • Sta­ble: Re-ex­e­cut­ing the same test should yield the same re­sult;
  • Re­pro­ducible: The test should be runnable on other ma­chines with the same hard­ware, and yield the same re­sult;

What this means is that we should use the de­fault con­fig­u­ra­tion as much as pos­si­ble (no weird setup in CI). Ad­di­tion­ally, we need to re­duce the amount of state in the sys­tem to the ab­solute min­i­mum. This can be achieved in the fol­low­ing way:

  • Power cy­cle the ma­chine be­tween each test cy­cle: this helps re­set the hard­ware;
  • Go disk­less if at all pos­si­ble, or treat the disk as a cache that can be flushed when test­ing fails;
  • Pre-com­pute as much as pos­si­ble out­side of the test ma­chine, to re­duce the im­pact of the en­vi­ron­ment of the ma­chine run­ning the test.

Fi­nally, the ma­chine should not re­strict which ker­nel / Op­er­at­ing Sys­tem can be loaded for test­ing. An easy way to achieve this is to use net­boot (PXE), which is a com­mon BIOS fea­ture al­low­ing disk­less ma­chines to boot from the net­work.

Con­vert­ing a ma­chine for test­ing

Now that we have a pretty good idea about the de­sign prin­ci­ples be­hind prepar­ing a ma­chine for CI, let’s try to ap­ply them to an ac­tual ma­chine.

Step 0: Se­lect ma­chines with­out in­ter­nal bat­ter­ies

While lap­tops and other hand-held de­vices are com­pact de­vices you may al­ready have avail­able, they can be tricky to power cy­cle. They may not boot once the bat­tery gets dis­con­nected, their per­for­mance may be de­graded, or they may out­right crash when un­der stress as the bat­tery isn’t there to smooth out the power rails, lead­ing to brownouts…

Your time is valu­able, and this is es­pe­cially true if this is your first ex­pe­ri­ence with a bare-metal CI sys­tem. I would sug­gest start with x86-based sin­gle-board com­put­ers, or (small-form-fac­tor?) desk­top PCs if at all pos­si­ble. As an ex­am­ple, if you wanted to test Ap­ple Sil­i­con, I would rec­om­mend sourc­ing Mac Minis rather than a Mac­book Air.

Any­way, if you de­cide to go for­ward with a bat­tery-pow­ered ma­chine, the first step is sim­ply to at­tempt boot­ing it with the bat­tery dis­con­nected. Be re­ally pa­tient as the ma­chine may take longer to boot than usual due to the em­bed­ded con­troller re­peat­edly fail­ing to com­mu­ni­cate with the now-dis­con­nected bat­tery. This can take a minute or two…

If the ma­chine did man­age to boot up, con­grat­u­la­tions! You will now need to ver­ify that its per­for­mance is un­af­fected by the change, and that it re­mains sta­ble. A quick and easy check would be to run the fol­low­ing stress test while check­ing which CPU fre­quen­cies were reached by the CPU:

$ stress -c `nproc` -i 10 -d 5 -t 120

If the test passed, con­sider your­self lucky! You may want to pro­ceed with this tu­to­r­ial!

How­ever, if the boot fails/takes too long, or if the ma­chine is not op­er­at­ing at its ex­pected per­for­mance or re­li­a­bil­ity, all is not lost. Check out your op­tions in our ded­i­cated en­try in the FAQ.

Step 1: Pow­er­ing up the ma­chine re­motely

In or­der to power up, a ma­chine of­ten needs both power and a sig­nal to start. The lat­ter is usu­ally pro­vided by a power but­ton, but ad­di­tional ways ex­ist (non-ex­haus­tive):

  • Wake on LAN: An Eth­er­net frame sent to the net­work adapter trig­gers the boot;
  • Power on by Mouse/Key­board: Any ac­tiv­ity on the mouse or the key­board will boot the com­puter;
  • Power on AC: Pro­vid­ing power to the ma­chine will au­to­mat­i­cally turn it on;
  • Timer: Boot at a spec­i­fied time.

An Intel motherboard's list of wakeup
events

Un­for­tu­nately, none of these trig­gers can be used to also turn off the ma­chine. The only way to guar­an­tee that a ma­chine will power down and re­set its in­ter­nal state com­pletely is to cut its power sup­ply for a sig­nif­i­cant amount of time. A safe way to pro­vide/cut power is to use a re­motely-switch­able Power Dis­tri­b­u­tion Unit (ex­am­ple), a man­aged eth­er­net switch with per-port switch­able PoE (Power over Eth­er­net) ports, or sim­ply us­ing some smart plug such as Shelly plugs or Ikea’s TRÅDFRI. In any case, make sure you rely on as few ser­vices as pos­si­ble (no cloud!), that you won’t ex­ceed the rat­ings of the power sup­ply (volt­age, power, and cy­cles), and can read back the state to make sure the com­mand was well re­ceived. If you opt out for the in­dus­trial PDUs, make sure to check out PDU Gate­way, our REST ser­vice to con­trol the ma­chines.

An example of a PDU

Now that we can re­li­ably cut/pro­vide power, we still need to con­trol the boot sig­nal. The dif­fi­culty here is that the sig­nal needs to be re­ceived af­ter the ma­chine re­ceived power and ini­tial­ized enough to re­ceive this event. To make things as easy as pos­si­ble, the eas­i­est is to con­fig­ure the BIOS to boot as soon as the power is brought to the com­puter. This is usu­ally called “Boot on AC”. If your com­puter does not sup­port this fea­ture, you may want to try the other ones, or use a mi­cro­con­troller to press the power but­ton for you when pow­er­ing up (see the HELP! My ma­chine can’t … Boot on AC sec­tion at the end of this ar­ti­cle).

Step 2: Net boot­ing

Net boot­ing is quite com­monly sup­ported on x86 and ARM boot­load­ers.

On x86 plat­forms, you can gen­er­ally find this op­tion in the boot op­tion pri­or­i­ties un­der the name PXE boot or network boot. You may also need to en­able the LAN option ROM, LAN controller, or the UEFI network stack. Re­boot, and check that your ma­chine is try­ing to get an IP!

On ARM/RiscV plat­forms, the board’s boot­loader may al­ready de­fault to PXE boot­ing when no bootable me­dia is found (see Rasp­berry Pi’s boot se­quence). Don’t panic if your board doesn’t do it by de­fault, you’ll just need to in­stall one that will do the job:

  • Mod­ern:
    • bare­box: A POSIX/Linux-like in­ter­face, but few boards sup­ported;
    • tow-boot: Good sup­port for the pop­u­lar SBCs, sane de­faults, good UI;
    • tianocore / EDK2: Full-UEFI en­vi­ron­ment, nice UI, but slow to boot;
  • Old-school:
    • u-boot: Widest boards com­pat­i­bil­ity, good fea­ture-set, but only works with small ker­nels. Use as a last re­sort!

The next step will be to set up a ma­chine, called Test­ing Gate­way, that will pro­vide a PXE ser­vice. This ma­chine should have two net­work in­ter­faces, one con­nected to a pub­lic net­work, and one con­nected to the test ma­chines (through a switch). Set­ting up this ma­chine will be the sub­ject of an up­com­ing blog post, but if your are im­pa­tient, you may use our valve-in­fra con­tainer or the sim­pler net­boot2­con­tainer.

Step 3: Em­u­lat­ing your screen and key­board us­ing a se­r­ial con­sole

Thanks to the pre­vi­ous steps, we can now boot in any Op­er­at­ing Sys­tem we want, but we can­not in­ter­act with it…

One so­lu­tion could be to run an SSH server on the Op­er­at­ing Sys­tem, but un­til we could con­nect to it, there would be no way to know what is go­ing on. In­stead, we could use an an­cient tech­nol­ogy, a se­r­ial port, to drive a con­sole. This so­lu­tion is of­ten called “Se­r­ial con­sole” and is sup­ported by most Op­er­at­ing Sys­tems. Se­r­ial ports come in two types:

  • UART: volt­age chang­ing be­tween 0 and VCC (TTL sig­nalling), more com­mon in the Sys­tem-on-Chip (SoC) and mi­cro­con­trollers world;
  • RS-232: volt­age chang­ing be­tween a pos­i­tive and neg­a­tive volt­age, more com­mon in the desk­top and dat­a­cen­ter world.

In any case, I sug­gest you find a se­r­ial-to-USB adapter adapted to the com­puter you are try­ing to con­nect:

On Linux, us­ing a se­r­ial con­sole is rel­a­tively sim­ple, just add the fol­low­ing in the com­mand line to get a con­sole on your screen AND over the /dev/ttyS0 se­r­ial port run­ning at 9600 bauds:

console=tty0 console=ttyS0,9600 earlyprintk=vga,keep

If your ma­chine does not have a se­r­ial port but has USB ports, which is more the norm than the ex­cep­tion in the desk­top/lap­top world, you may want to con­nect two RS-232-to-USB adapters to­gether, us­ing a Null mo­dem ca­ble:

Test Machine <-> USB <-> RS-232 <-> NULL modem cable <-> RS-232 <-> USB Hub <-> Gateway

And the ker­nel com­mand line should use ttyACM0 / ttyUSB0 in­stead of ttyS0.

Putting it all to­gether

Start by re­mov­ing the in­ter­nal bat­tery if it has one (lap­tops), and any built-in wire­less an­tenna. Then set the BIOS to boot on AC, and use net­boot.

Steps for an AMD moth­er­board:

Steps for an In­tel moth­er­board:

Fi­nally, con­nect the test ma­chine to the wider in­fra­struc­ture in this way:

If you man­aged to do all this, then con­grat­u­la­tions, you are set! If you got some is­sues with any of the pre­vi­ous steps, brace your­self, and check out the fol­low­ing sec­tion!

HELP! My ma­chine can’t …

Net boot

It’s an­noy­ing, but it is su­per sim­ple to work around that. What you need is to in­stall a boot­loader on a drive or USB stick which sup­ports PXE.

I would rec­om­mend you look into iPXE, as it is su­per easy to setup and amaz­ingly ver­sa­tile!

Boot on AC

Well, that’s a bum­mer, but that’s not the end of the line ei­ther if you have some ex­pe­ri­ence deal­ing with mi­cro­con­trollers, such as Ar­duino. Pro­vided you can find the fol­low­ing 4 wires, you should be fine:

  • Ground: The eas­i­est to find;
  • Power rail: 3.3 or 5V de­pend­ing on what your con­troller ex­pects;
  • Power LED: A sig­nal that will change when the com­puter turns on/off;
  • Power Switch: A sig­nal to pull-up/down to start the com­puter.

On desk­top PCs, all these wires can be eas­ily found in the moth­er­board’s man­ual. For lap­tops, you’ll need to scour the moth­er­board for these sig­nals us­ing a mul­ti­me­ter. Pay ex­tra at­ten­tion when look­ing for the power rail, as it needs to be able to source enough cur­rent for your mi­cro­con­troller. If you are strug­gling to find one, look for the VCC pins of some of the chips and you’ll be set.

Next, you’ll just need to fig­ure out what volt­age the power LED is at when the ma­chine is ON or OFF. Make sure to check that this volt­age is com­pat­i­ble with your mi­cro­con­troller’s in­put rat­ing and plug it di­rectly into a GPIO of your mi­cro­con­troller.

Let’s then do the same work for the power switch, ex­cept this time we also need to check how much cur­rent will flow through it when it is ac­ti­vated. To do that, just use a mul­ti­me­ter to check how much cur­rent is flow­ing when you con­nect the two wires of the power switch. Check that this amount of cur­rent can be sourced/sinked by the mi­cro­con­troller, and then con­nect it to a GPIO.

Fi­nally, we need to find power for the mi­cro­con­troller that will be pre­sent as soon as we plug the ma­chine to the power. For desk­top PCs, you would find this in Pin 9 of the ATX con­nec­tor. For lap­tops, you will need to probe the moth­er­board un­til you find a pin that has one with a volt­age suit­able for your mi­cro­con­troller (5 or 3.3V). How­ever, make sure it is able to source enough cur­rent with­out the volt­age drop­ping bel­low the min­i­mum ac­cept­able VCC of your mi­cro­con­troller. The best way to make sure of that is to con­nect this rail to the ground through a ~100 Ohm and check that the volt­age at the leads of the re­sis­tor, and keep on try­ing un­til you find a suit­able place (took me 3 at­tempts). Con­nect your mi­cro­con­troller’s VCC and ground to the these pads.

The last step will be to edit this Ar­duino code for your needs, flash it to your mi­cro­con­troller, and it­er­ate un­til it works!

Here is a photo sum­mary of all the above steps:

Thanks to Arka­diusz Hiler for giv­ing me a cou­ple of these Blue­Pills, as I did not have any mi­cro­con­troller that would be small-enough to fit in place of a lap­top speaker. If you are a novice, I would sug­gest you pick an Ar­duino nano in­stead.

Oh, and if you want to cre­ate a board that would be generic-enough for most moth­er­boards, check out the schemat­ics from my al­most-decade-old blog post about do­ing just that!

Boot / run nor­mally (slow/un­re­li­able) with­out a bat­tery

Be­fore go­ing any fur­ther, I would re­ally urge you to re­con­sider your de­ci­sion to use this ma­chine for CI. If there are no other al­ter­na­tives, don’t de­spair, things will be …. ju­u­u­u­u­ust fine!

Your state of mind, right now!

Since we want our test ma­chines to be­have in the same way as users’, we should strive for min­i­miz­ing the im­pact of our mod­i­fi­ca­tions to the ma­chine.

When it comes to the in­ter­nal bat­tery, we ide­ally want it to be con­nected while the ma­chine is run­ning (mir­ror­ing how users would use the ma­chine), and dis­con­nected be­tween test jobs so as to min­i­mize the chances of any state leak­ing be­tween jobs which would af­fect re­pro­ducibil­ity of re­sults.

We can achieve this goal at two lev­els: in soft­ware by hack­ing on the em­bed­ded con­troller, or phys­i­cally by mod­i­fy­ing the power-de­liv­ery.

1. Hack the Em­bed­ded Con­troller (EC)

If your de­vice’s firmware or em­bed­ded con­troller (EC) is open source, you should be able to mon­i­tor the state of the power sup­ply, and you prob­a­bly can find a way to turn off the ma­chine (the rou­tine called when press­ing the power but­ton for 10s) when the main power sup­ply is dis­con­nected.

Un­for­tu­nately, the only de­vices with open source EC I am aware of are chrome­books, so your only choice may be to…

2. In­stru­ment the ma­chine’s power de­liv­ery

If we can’t get the em­bed­ded con­troller to do the work for us, we can do the same us­ing a 5V re­lay with a nor­mally-open con­tact, a few wires, a sol­der­ing iron, and an old USB power sup­ply!

The first step is to fig­ure out a way to de­tect whether the power sup­ply is con­nected or not. The fool­proof way is to use use an old USB charger, con­nected to the same PDU port as the ma­chine’s power sup­ply. This will pro­vide us with a 5V power sup­ply when the ma­chine is sup­posed to be ON, and 0V oth­er­wise.

  1. Dis­con­nect the power to the ma­chine
  2. Open up the ma­chine enough to ac­cess its main PCB and bat­tery
  3. Dis­con­nect the bat­tery
  4. Iden­tify the neg­a­tive and pos­i­tive leads of the bat­tery us­ing a volt­meter
  5. Cut the pos­i­tive lead(s)
  6. Sol­der ex­ten­sion wires to both sides
  7. Sol­der them to the nor­mally-open con­tacts of your 5V re­lay
  8. Sol­der the coil leads of the re­lay to an old USB-A ca­ble
  9. Se­cure every­thing with heat­shrink and hot­glue
  10. Close the ma­chine and test it

That’s all, folks!

Comments