This post from last year was posted to a forum, so I thought I'd write up some rebuttals to their comments.

The first comment is by David Chisnall, creator of CHERI C/C++, which proposes we can solve the problem with CPU instruction set extensions. It's a good idea, but after 14 years, CPUs haven't had their instruction-sets upgraded. Even mainstream RISC V processors haven't been created using those extensions.

Chisnall: "If your safety requires you to insert explicit checks, it’s not safe". This is true from one perspective, false from another. My proposal includes compilers spitting out warnings whenever bounds information doesn't exist.

C is full of problems in theory that doesn't exist in practice because the compiler spits out warnings telling programmers to fix the problem. Warnings can also note cases where programmers probably made mistakes. We can't achieve perfect guarantees, because programmers can still make mistakes, but we can certainly achieve "good enough".

Chisnall: ....tread safety..... I'm not sure I full understand the comment. I understand that CHERI can guarantee atomicity of bounds checking, which would require multiple (interruptible) instructions otherwise. The number of cases where this is a problem, and the C proposal would be no worse than other languages like Rust.

Chisnall: Temporal safety.... A lot of Rust "ownership" techniques can be applied to C with these annotations, namely, marking which variables OWN allocated memory and which simply BORROW it. I've reviewed a lot of famous use-after-free and double-free bugs, and most can be trivially fixed by annotation.

Chisnall: If you are writing a blog never having actually tried to make large (million line or more) C codebases memory safe, you probably underestimate the difficulty by at least one order of magnitude. I'm both a programmer who has written a million lines of code in my lifetime as well as a hacker with decades of experience looking for such bugs. The goal isn't to pursue the ideal of 100% safe language, but of getting rid of 99% of safety errors. 1% less safe makes the goal an order of magnitude easier to reach.

snej: This post seems to epitomize the common engineer trait of seeing any problem you haven’t personally worked on as trivial. Sure bro, you’ll add a few patches to Clang and GCC and with those new attributes our C code will be safe. It’ll only take a few weeks and then no one will need Rust anymore. But I've spent decades working on this. The comment epitomizes the common trait of not realizing how much thought and expertise is behind the post. I few patches to clang and GCC will make make C safer. The solution is far less safe than Rust. In fact, my proposal makes code more interoperable and translatable into Rust. Right now, translating C into Rust creates just a bunch of 'unsafe' code that needs to be cleaned up. With such annotations, in a refactoring step using existing testing frameworks, results in code that can no be auto-translated safely in to Rust.

As for existing clang/gcc attributes, there are only a couple that match the macros I propose. They dod show how trivial it would be to actually go further.

danso: In addition to the criticisms I share with everyone else, I found this to be one of the most “talk is cheap, show me the code” posts I’ve ever read. The reason I wrote the post is because learning clang/gcc internals is a long process, and when asking for help, I needed something to point to "this is what I'm trying to achieve". I'm not trying to communicate what other people should do, I'm communicating what I'm trying to do. I still don't know clang/gcc internals enough to even get started ... any pointers would be helpful.




Uncategorized

The idea of memory-safe languages is in the news lately. C/C++ is famous for being the world's system language (that runs most things) but also infamous for being unsafe. Many want to solve this by hard-forking the world's system code, either by changing C/C++ into something that's memory-safe, or rewriting everything in Rust.

Forking is a foolish idea. The core principle of computer-science is that we need to live with legacy, not abandon it.

And there's no need. Modern C compilers already have the ability to be memory-safe, we just need to make minor -- and compatible -- changes to turn it on. Instead of a hard-fork that abandons legacy system, this would be a soft-fork that enables memory-safety for new systems.

Consider the most recent memory-safety flaw in OpenSSL. They fixed it by first adding a memory-bounds, then putting every access to the memory behind a macro PUSHC() that checks the memory-bounds:

A better (but currently hypothetical) fix would be something like the following:

size_t maxsize CHK_SIZE(outptr) = out ? *outlen : 0;

This would link the memory-bounds maxsize with the memory outptr. The compiler can then be relied upon to do all the bounds checking to prevent buffer overflows, the rest of the code wouldn't need to be changed.

An even better (and hypothetical) fix would be to change the function declaration like the following:

int ossl_a2ulabel(const char *in, char *out, size_t *outlen CHK_INOUT_SIZE(out));

That's the intent anyway, that *outlen is the memory-bounds of out on input, and receives a shorter bounds on output.

This specific feature isn't in compilers. But gcc and clang already have other similar features. They've only been halfway implemented. This feature would be relatively easy to add. I'm currently studying the code to see how I can add it myself. I could just mostly copy what's done for the alloc_size attribute. But there's a considerable learning curve, I'd rather just persuade an existing developer of gcc or clang to add the new attributes for me.

Once you give the programmer the ability to fix memory-safety problems like the solution above, you can then enable warnings for unsafe code. The compiler knew the above code was unsafe, but since there's no practical way to fix it, it's pointless nagging the programmer about it. With this new features comes warnings about failing to use it.

In other words, it becomes compiler-guided refactoring. Forking code is hard, refactoring is easy.

As the above function shows, the OpenSSL code is already somewhat memory safe, just based upon the flawed principle of relying upon diligent programmers. We need the compiler to enforce it. With such features, the gap is relative small, mostly just changing function parameter lists and data structures to link a pointer with its memory-bounds. The refactoring effort would be small, rather than a major rewrite.

This would be a soft-fork. The memory-bounds would work only when compiled with new compilers. The macro would be ignored on older systems. 

This memory-safety is a problem. The idea of abandoning C/C++ isn't a solution. We already have the beginnings of a solution in modern gcc and clang compilers. We just need to extend that solution.

Uncategorized

Today is the 20th anniversary of the Slammer worm. I'm still angry over it, so I thought I'd write up my anger. This post will be of interest to nobody, it's just me venting my bitterness and get off my lawn!!


Back in the day, I wrote "BlackICE", an intrusion detection and prevention system that ran as both a desktop version and a network appliance. Most cybersec people from that time remember it as the desktop version, but the bulk of our sales came from the network appliance.

The network appliance competed against other IDSs at the time, such as Snort, an open-source product. For much the cybersec industry, IDS was Snort -- they had no knowledge of how intrusion-detection would work other than this product, because it was open-source.

My intrusion-detection technology was radically different. The thing that makes me angry is that I couldn't explain the differences to the community because they weren't technical enough.

When Slammer hit, Snort and Snort-like products failed. Mine succeeded extremely well. Yet, I didn't get the credit for this.


The first difference is that I used a custom poll-mode driver instead of interrupts. This the now the norm in the industry, such as with Linux NAPI drivers. The problem with interrupts is that a computer could handle less than 50,000 interrupts-per-second. If network traffic arrived faster than this, then the computer would hang, spending all it's time in the interrupt handler doing no other useful work. By turning off interrupts and instead polling for packets, this problem is prevented. The cost is that if the computer isn't heavily loaded by network traffic, then polling causes wasted CPU and electrical power. Linux NAPI drivers switch between them, interrupts when traffic is light and polling when traffic is heavy.

The consequence is that a typical machine of the time (dual Pentium IIIs) could handle 2-million packets-per-second running my software, far better than the 50,000 packets-per-second of the competitors.

When Slammer hit, it filled a 1-gbps Ethernet with 300,000 packets-per-second. As a consequence, pretty much all other IDS products fell over. Those that survived were attached to slower links -- 100-mbps was still common at the time.

An industry luminary even gave a presentation at BlackHat saying that my claimed performance (2-million packets-per-second) was impossible, because everyone knew that computers couldn't handle traffic that fast. I couldn't combat that, even by explaining with very small words "but we disable interrupts".

Now this is the norm. All network drivers are written with polling in mind. Specialized drivers like PF_RING and DPDK do even better. Networks appliances are now written using these things. Now you'd expect something like Snort to keep up and not get overloaded with interrupts. What makes me bitter is that back then, this was inexplicable magic.

I wrote an article in PoC||GTFO 0x15 that shows how my portscanner masscan uses this driver, if you want more info.


The second difference with my product was how signatures were written. Everyone else used signatures that triggered on the pattern-matching. Instead, my technology included protocol-analysis, code that parsed more than 100 protocols.

The difference is that when there is an exploit of a buffer-overflow vulnerability, pattern-matching searched for patterns unique to the exploit. In my case, we'd measure the length of the buffer, triggering when it exceeded a certain length, finding any attempt to attack the vulnerability.

The reason we could do this was through the use of state-machine parsers. Such analysis was considered heavy-weight and slow, which is why others avoided it. State-machines are faster than pattern-matching, many times faster. Better and faster.

Such parsers are now more common. Modern web-servers (nginx, ISS, LightHTTPD, etc.) use them to parse HTTP requests. You can tell if a server does this by sending 1-gigabyte of spaces between "GET" and "/". Apache gives up after 64k of input. State-machines keep going, because while in that state ("between-method-and-uri"), they'll accept any number of spaces -- the only limit is a timeout. Go read the nginx source-code to understand how this works.

I wrote a paper in PoC||GTFO 0x21 that shows the technique to implement the common wc (word-count) program. A simplified version of this wc2o.c. Go read the code -- it's crazy.

The upshot is that when Slammer hit, most IDSs didn't have a signature for it. If they didn't just fall over, what they triggered on were things like "UDP flood", not "SQL buffer overflow". This lead many to believe what was happening was DDoS attack. My product correctly identified the vulnerability being exploited.


The third difference with my product was the event coalescer. Instead of a timestamp, my events had a start-time, end-time, and count of the number of times the event triggered.

Other event systems sometimes have this, with such events as "last event repeated 39003 times", to prevent the system from clogging up with events.

My system was more complex. For one things, an attacker may deliberately intermix events, so it can't simply be 1 event that gets coalesced this way. For another thing, the attacker could sweep targets or spoof sources. Thus, coalescing needed to aggregate events over address ranges as well as time.

Slammer easily filled a gigabit link with 300,000 packets-per-second. Every packet triggered the signature, thus creating 300,000 events-per-second. No system could handle that. To keep up with the load, events had to be reduced somehow.

My event coalescing logic worked. It reduced the load of events from 300,000 down to roughly 500 events-per-second. This was still a little bit higher load than the system could handle, forwarding to the remote management system. Customers reported that at their consoles, they saw the IDS slowly fall behind, spooling events at the sensor and struggling to ship them up to the management system.

The problem is so accurate that it's a big flaw in IDS still to this day. Snort often has signatures that throw away the excess data, but it's still easy to flood them with packets that overload their event logging.

What was exciting for me is that I'd designed all this in theory, tested using artificial cases, unsure how it would stand up to the real world. Watching it stand up to the real world was exciting: big customers saw it successfully work in practice, with the only complaint that at the centralized console, it fell behind a little.


The point is that I made three radical design choices, unprecedented at the time though more normal now, and they worked. And yet, the industry wasn't technical enough to recognize that it worked.

For example, a few months later I had a meeting at the Pentagon where a Gartner analyst gave a presentation claiming that only hardware-based IDS would work, because software-based IDS couldn't keep up. Well, these were my customer. I didn't refute Gartner so much as my customer did, with their techies standing up and pointing out that when Slammer hit, my "software" product did keep up. Gartner doesn't test products themselves. They rightly identified the problem with other software using interrupts, but couldn't conceive there was a third alternative, "poll mode" drivers.

I apologize to you, the reader, for subjecting you to this vain bitching, but I just want to get this off my chest.

Uncategorized

I should write up a larger technical document on this, but in the meanwhile is this short (-ish) blogpost. Everything you know about RISC is wrong. It's some weird nerd cult. Techies frequently mention RISC in conversation, with other techies nodding their head in agreement, but it's all wrong. Somehow everyone has been mind controlled to believe in wrong concepts.

An example is this recent blogpost which starts out saying that "RISC is a set of design principles". No, it wasn't. Let's start from this sort of viewpoint to discuss this odd cult.

What is RISC?

Because of the march of Moore's Law, every year, more and more parts of a computer could be included onto a single chip. When chip densities reached the point where we could almost fit an entire computer on a chip, designers made tradeoffs, discarding unimportant stuff to make the fit happen. They made tradeoffs, deciding what needed to be included, what needed to change, and what needed to be discarded.

RISC is a set of creative tradeoffs, meaningful at the time (early 1980s), but which were meaningless by the late 1990s.

The interesting parts of CPU evolution are the three decades from 1964 with IBM's System/360 mainframe and 2007 with Apple's iPhone. The issue was a 32-bit core with memory-protection allowing isolation among different programs with virtual memory. These were real computers, from the modern perspective: real computers have at least 32-bit and an MMU (memory management unit).

The year 1975 saw the release of Intel 8080 and MOS 6502, but these were 8-bit systems without memory protection. This was at the point of Moore's Law where we could get a useful CPU onto a single chip.

In the year 1977 we saw DEC release it's VAX minicomputer, having a 32-bit CPU w/ MMU. Real computing had moved from insanely expensive mainframes filling entire rooms to less expensive devices that merely filled a rack. But the VAX was way too big to fit onto a chip at this time.

The real interesting evolution of real computing happened in 1980 with Motorola's 68000 (aka. 68k) processor, essentially the first microprocessor that supported real computing.

But this comes with caveats. Making microprocessor required creative work to decide what wasn't included. In the case of the 68k, it had only a 16-bit ALU. This meant adding two 32-bit registers required passing them twice through the ALU, adding each half separately. Because of this, many call the 68k a 16-bit rather than 32-bit microprocessor.

More importantly, only the lower 24-bits of the registers were valid for memory addresses. Since it's memory addressing that makes a real computer "real", this is the more important measure. But 24-bits allows for 16-megabytes of memory, which is all that anybody could afford to include in a computer anyway. It was more than enough to run a real operating system like Unix. In contrast, 16-bit processors could only address 64-kilobytes of memory, and weren't really practical for real computing.

The 68k didn't come with a MMU, but it allowed an extra MMU chip. Thus, the early 1980s saw an explosion of workstations and servers consisting of a 68k and an MMU. The most famous was Sun Microsystems launched in 1982, with their own custom designed MMU chip.

Sun and its competitors transformed the industry running Unix. Many point to IBM's PC from 1982 as the transformative moment in computer history, but these were non-real 16-bit systems that struggled with more than 64k of memory. IBM PC computers wouldn't become real until 1993 with Microsoft's Windows NT, supporting full 32-bits, memory-protection, and pre-emptive multitasking.

But except for Windows itself, the rest of computing is dominated by the Unix heritage. The phone in your hand, whether Android or iPhone, is a Unix computer that inherits almost nothing from the IBM PC.

These 32-bit Unix systems from the early 1980s still lagged behind DEC's VAX in performance. The VAX was considered a mini-supercomputer. The Unix workstations were mere toys in comparison. Too many tradeoffs were made in order to fit everything onto a single chip, too many sacrifices made.

Some people asked "What if we make different tradeoffs?"

Most people thought the VAX was the way of the future, and were all chasing that design. The 68k CPU was essentially a cut down VAX design. But history had anti-VAX designs that worked very differently, notably the CDC 6600 supercomputer from the 1960s and the IBM 801/ROMP processor from the 1970s.

It's not simply one tradeoff, but a bunch of inter-related tradeoffs. They snowball -- each choice you make changes the costs-vs-benefit analysis of other choices, changing them as well.

This is why people can't agree upon a single definition of RISC. It's not one tradeoff made in isolation, but a long list of tradeoffs, each part of a larger scheme.

In 1987, Motorola shipped its 68030 version of the 68k processor, chasing the VAX ideal. By then, we had ARM, SPARC, and MIPS processors that significantly outperformed it. Given a budget of roughly 100,000 transistors allowed by Moore's Law of the time, the RISC tradeoffs were better than VAX-like tradeoffs.

So really, what is RISC?

Let's define things in terms of 1986, comparing the [ARM, SPARC, MIPS] processors called "RISC" to the [68030, 80386] processors that weren't "RISC". They all supported full 32-bit processing, memory-management, and preemptive multitasking operating systems like Unix.

The major ways RISC differed were:

  • fixed-length instructions (32-bits or 4-bytes each)
  • simple instruction decoding
  • horizontal vs. vertical microcode
  • deep pipelines of around 5 stages
  • load/store aka reg-reg
  • simple address modes
  • compilers optimized code
  • more registers

If you are looking for the one thing that defines RISC, it's the thing that nobody talks about: horizontal microcode.

The VAX/68k/x86 architecture decoded external instructions into internal control ops that were pretty complicated, supporting such things as loops. Each external instruction executed an internal microprogram with a variable number of such operations.

The classic RISC worked differently. Each external instruction decoded into exactly 4 internal ops. Moreover, each op had a fixed purpose:

  1. read from two registers into the ALU (arithmetic-logic unit)
  2. execute a math operation in the ALU
  3. access memory (well, the L1 cache)
  4. write results back into one register

(This explanation has been fudged and simplified, btw).

This internal detail was expressed externally in the instruction set, simplifying decoding. The external instructions specified two registers to read, an ALU opcode, and one register to write. All of this was fit into a constant 32-bits. In contrast, the [68k/x86/VAX] model meant a complex decoding of instructions with a large ROM containing microprograms.

Roughly half (50%) of the 68000's transistors contained this complex decoding logic and ROM. In contrast, for RISC processors, it was closer to 1%. All those transistors could be dedicated to other things. See how tradeoffs snowball? Saving so many transistors involved in instruction decoding meant being able to support other features elsewhere. It's not clear this is a benefit, however. This meant that RISC needed multiple instructions to do the same thing as a single [68k/x86/VAX] instruction.

This meant instructions could be deeply pipelined. Instructions could be overlapped. When reading registers for the current instruction, we can simultaneously be fetching the next, and performing the ALU calculation on the previous instruction. The classic RISC pipeline had 5 stages (the 4 mentioned above plus 1 for fetching the next instruction). Each clock cycle would execute part of 5 instructions simultaneous, each at a different stage in the pipeline.

This was called scalar operation, In previous processor, it would take a variable number of clock cycles for an instruction to complete. In RISC, every instruction had 5 clock cycle latency from beginning to end. And since execution was overlapped/pipelined, executing 5 instructions at a time, the throughput was one instruction per clock cycle.

All CPUs are pipelined to some extent, but they need complex interlocks to prevent things from colliding with each other, such as two pipelined instructions trying to read registers at the same time. RISC removed most of those interlocks, by strictly regulation what an instruction could do in each stage of the pipeline. Removing these interlocks reduced transistor count and sped things up. This could be one possible definition of RISC that you never hear of: it got rid of all these interlocks found in other processors.

Some pipeline conflicts were worse. Because pipelining, the results of an instruction won't be available until many clock cycles later. What if one instruction writes its results to register #5 (r5), and the very next instruction attempts to read from register #5 (r5)? It's too soon, it has to wait more clock cycles for the result.

The answer: don't do that. Assembly language programmers need to know this complication, and are told to simply not write code that does this, because then the program won't work.

This was anathema of the time. Throughout history to this point, each new CPU architecture also had a new operating-system written in assembly language, with many applications written in assembly language. Thus, a programmer-friendly assembly language was considered one of the biggest requirements for any new system. Requiring programmers to know such quirks lead to buggy code was simply unacceptable. Everybody knew that programmer-hostile instruction-sets would never work in the market, even if they performed faster and cheaper.

But technology is littered with what everybody knowing being wrong. In this case, by 1980 we had the  C programming language that was essentially a "portable assembly language" and the Unix operating system written in C. The only people who needed to know about a quirky assembly language were the compiler writers. They would take care of all such problems.

That's why the history lesson above talks about Unix and real computing. Without Unix and C, RISC wouldn't have happened. An operating-system written in a high-level language was a prerequisite for RISC. It's as import an innovation as Moore's Law allowing 100,000 transistors to fit on a chip.

Because of the lack of complex decoding logic, the transistor budget was freed up to support such things as more registers. The Intel x86 architecture famously had 8 registers, while the RISC competitors typically had as many as 32. The limitation was decode space. It takes 5 bits to specify one of 32 possibilities. Given that most every instructions specified two registers to read from and one register to write to, that's 15 bits, or half of the instruction space, leaving 17 bits for other purposes.

The creators of the RISC, Hennessy and Patterson, wrote a textbook called a "Computer Architecture: A Quantitative Approach". It's horrible. It imagines a world where people need to be taught tradeoffs and transistor budgets. But there is no other approach than a quantitative one, it's like an economics textbook "Economics: A Supply And Demand Approach". While the textbook has a weird obsession with quantitative theory, it misses non quantitative tradeoffs, like the fact that RISC couldn't happen without C and Unix. 

Among the snowballing tradeoffs is the load/store architecture, while at the same time, having fewer addressing modes. It's here that we need to go back and discuss history -- what the heck is an "addressing mode"????

In the beginning, computers had only a single general purpose register, called the accumulator. All calculations, like adding two numbers together, involved reading the second value from memory and combining with the first value already in the accumulator. All calculations, whether arithmetic (add, subtract) or logical (AND, OR, XOR) involved one value already in the register, and another value from memory.

Addresses have to be calculated. For example, when accessing elements in a table, we have to take the row number, multiply it by the size of the table, add an offset into the row for desired column, then add all that to the address at the start of the table. Then after calculating this address, we often want to increment the index to fetch the next row.

If the table base address and row index are held in registers, we might get a complex instructions like the following. This calculates and address using two registers r10 and r11, fetches that value from memory, then adds it into register r9.

 ADD r9, [r10 + r11*8 + 4]

Such calculations embedded in the instruction-set were necessary for such early computers. While they had only a single general purpose register (the accumulator), they still had multiple special purpose registers used this way for address calculations. 

For complex computers like the VAX, such address modes imbedded in instructions were no longer necessary, but still desirable. Half the work of the computer is in calculating memory addresses. It's very tedious for programmers to do it manually, easier when the instruction-set takes care of common memory access patterns (like accessing cells within a table).

This leads us to the load/store issue.

With many registers, we no longer need to read another value from memory (a reg-mem calculation). We can instead perform the calculation using two registers (reg-reg). The VAX had such reg-reg instructions, but programmers still mostly used the reg-mem instructions with the complex address calculations.

RISC changed this. Calculations were now exclusively reg-reg, where math operations like addition could only operate on registers. To add something from memory, you needed first to load it from memory into a register, using a separate, explicit instruction. Likewise, writing back to memory required an explicit store operation.

This architecture can be called either reg-reg or load/store, with the second name being more popular.

With RISC, addressing modes were still desirable, but now they applied to only the two load and store instructions.

The available addressing modes were constrained by the limited RISC pipeline and limited 32-bit fixed-length instructions. Since the pipeline allowed for the reading of two registers at the start, adding two registers together to form the address was allowed. The example shown above was too complex, though.

What you are supposed to be reading from all of this is that all of these tradeoffs are linked. Each decision that diverges from the ideal VAX-like architecture snowballed into other decisions that drifted further and further from this ideal, until what we had was something that looked nothing like a VAX.

The upshot of these decisions was being able to reduce a 32-bit MMU CPU into roughly a single chip because it needed fewer transistors, while at the same time performing much faster. It required maybe twice as many instructions to perform the same tasks (mostly due to needing more complex address calculations due to lack of addressing modes), but performed them at maybe 5 times faster, for a significant speed up.

At the time, the VAX was the standard benchmark target. When Sun shipped it's first SPARC RISC systems (the Sun-4), they benchmarked about twice as fast as the latest VAX systems, while being considerably cheaper.

The end of RISC

By the late 1980s, everybody knew that RISC was the future. Sure, Intel continued with its x86 and Motorola with it's 68000, but that's because the market wanted backwards compatibility with legacy instruction-sets. Both attempted to build their own RISC alternatives, but failed. When backwards compatibility wasn't required, everybody created RISC processors, because for 32-bit MMU real computing, they were  better. And everybody knew it.

But of course, everybody was eventually wrong. Even as early as the 80486 in 1989, Intel was converting the innards of the processor into something that looked more RISC-like.

The nail in the coffin came in 1995 with Intel's Pentium Pro processor that supported out-of-order (or OoO) processing. Again, it wasn't really a new innovation. Out-of-order instructions first appeared on 1960s era supercomputers from CDC and IBM. This was the first time that transistor budgets allowed it to be practically used on single-chip microprocessors.

Transistor budgets were so high that designers no longer had to make basic painful tradeoffs. The decisions necessary trying to cram everything into 100,000 transistors were longer meaningful when you had more than 1-million transistors to work with. Instruction-set decoding requiring 20k transistors is important with small budgets, but meaningless with large budgets.

With OoO, the microarchitecture inside the chip looks roughly the same, regardless if it's an Intel x86, ARM, SPARC, or whatever.

This was proven in benchmarks. When Intel released its out-of-order Pentium Pro in 1995, it beat all the competing in-order RISC processors on the market.

Everybody was wrong -- RISC wasn't the future, the future was OoO.

One way of describing the Pentium Pro is that it "translates x86 into RISC-like micro-ops". What that really means is that instead of vertical microcode, it translated things into horizontal, pipelined micro-ops. Most of the typical math operations were split into two micro-ops, one a load/store operation, and the other a reg-reg operation. (Some x86 instructions need even more micro-ops: address calculation, then load/store, then ALU op).

Intel still has an "x86 tax" decoding complex instructions. But in terms of pipeline stages, that tax only applies to the first stage. Typical OoO processors have at least 10 more stages after that. Even RISC instruction-set processors like ARM must translate external instructions into internal micro-ops.

The only significant difference left is the fact that Intel's instructions are variable length. The fixed length instructions of RISC means that multiple can be fetched at once, and decoded all in parallel. This is impossible with Intel x86, they must at least partially be decoded serially, one before the next. You don't know where the next instruction starts until you've figured out the length of the current instruction.

Intel and AMD find crafty ways to get around this. For example, AMD has often put hints in its instruction cache (L1I) to so that decoders can know the length of instructions. Intel has played around with "loop caches" (so-called because they are most useful for loops) that track instructions after they've been decoded, so they don't need to be decoded again.

The upshot is that for most code, there's no inherent difference between x86 and RISC, they have essentially the same internal architecture for out-of-order (OoO) processors. No instruction-set has an inherent advantage over the other.

And it's been that way since 1995.

I mention this because bizarrely this cult has persisted for the last 30 years after OoO replaced RISC for high-end real computers. It ceased being a useful technical distinction, so what are techies still discussing it?

They persist in believing dumb things, for which no amount of deprogramming is possible. For example, they look at mobile (battery powered) devices and note that they use ARM chips to conserve power. They make the assumption that there must be some sort of inherent power-efficiency advantage.

This isn't true. These chips consume less power by simply being slower. Fewer transistors mean less power consumption. This meant while desktops/servers used power-hungry OoO processors, mobile phones went back to the transistor budgets of yesteryear, meaning back to in-order RISC.

But as Moore's Law turned, transistors got smaller, to the point where even mobile phones got OoO chips. They use clever tricks to keep that OoO chip powered down most of the time, often including an in-order chip that runs slower on less power for minor tasks.

We've reached the point where mobile and laptops now use the same chips, your MacBook uses (essentially) the same chip as your iPhone, which is the same chip as Apple desktops.

Now Apple's M1 ARM (and hence RISC) processor is much better at power consumption than it's older Intel x86 chip, but this isn't because it's RISC. Apple did a good job at analyzing what people do on mobile devices like laptops and phones and optimized for that. For example, they added a lot of great JavaScript features, cognizant of the ton of online and semi-offline apps that are written in JavaScript. In contrast, Intel attempts to optimize a chip simultaneously for laptops, desktops, and servers, leading poorly optimizations for laptops.

Apple also does crazy things like putting a high end GPU (graphics processor) on the same chip. This has the effect of making their M1 ARM CPU crazy good for desktops for certain applications, those requiring the sorts of memory-bandwidth normally needed by GPUs.

But overall, x86 chips from AMD and Intel are still faster on desktops and servers.

In addition to the fixed-length instructions providing a tiny benefit, ARM has another key advantage, but it has nothing to do with RISC. When they upgraded their instruction-set to support 64-bit instead of just 32-bit, they went back and redesigned it from scratch. This allowed them to optimize the new instruction-set for the OoO pipeline, such as removing some dependencies that slow things down.

This was something that Intel couldn't do. When it came time to support 64-bit, AMD simply extended the existing 32-bit instructions. A long sequence of code often looks identical between the 32-bit and 64-bit versions of the x86 instruction-sets, whereas they look completely different on ARM 32-bit vs. 64-bit.

What about RISC-V and ARM-on-servers?

We've reached the point in tech where the instruction-set doesn't matter. It's not simply that code is written in high-level language. It's mostly that micro-architectural details have converged.

Take byte-order, for example. Back in the 1980s, most of the major CPUs in the world were big-endian, while Intel bucked the trend being little-endian. The reason is that some engineer made a simple optimization back when the 8008 processor was designed for terminals, and because of backwards compatibility, the poor decision continues to plague x86 today.

Except when it annoys programmers debugging memory dumps, byte-order doesn't matter. Therefore, all the RISC processors allowed a simple bit to be set to switch processors from big-endian to little-endian mode.

Over time, that has caused everyone to match Intel's little-endianess, driven primarily by Linux. The kernel itself supports either mode, but a lot of drivers depend upon byte-order, and user-mode programs developed on x86 sometimes have byte-order bugs. As it was ported to architectures like ARM or PowerPC, most of the time it was done in little-endian mode. (You can get PowerPC Linux in big-endian, but the preference is little-endian, because drivers).

The same effect happens even in things that aren't strictly CPU related, like memory and I/O. The tech stack has converged so that processors look more and more alike except for the instruction-set.

The convergence of architecture is demonstrated most powerfully by Apple's M1 transition, where they stopped using Intel's processors in their computers in favor of their custom ARM processor they created for the iPhone.

The MacBook Air M1 looks identical on the outside compared to the immediately preceding x86 MacBook. But more the the point, it performs almost identically running x86 code -- it runs x86 code at native x86 speeds but on an ARM CPU. The processors are so similar architecturally that instruction-sets could be converted on the fly -- it simply reads the x86 program, converts to ARM transparently on the fly, then runs the ARM version. Previous code translation attempts have incurred massive slowdowns to account for architectural differences, but the M1 cheated by removing any differences that weren't instruction-set related, allowing smooth translation of the instructions.

Technically, instruction-sets don't matter, but for business reasons, they still do. Intel and AMD control x86, and prevent others from building compatible processors. ARM lets others build compatible processors (indeed, making no CPUs themselves), but charges them a license fee.

Especially for processors on the low-end, people don't want to pay license fees.

For that reason, RISC V has become popular. For low-end processors (in-order microcontrollers competing against ARM Cortex Ms) in the 100,000 transistor range, it matters that an instruction-set be RISC. The only free alternative is the aging MIPS. It has annoying quirks, like "delay slots", which are fixed by RISC V. Since RISC V is an open standard, free of license fees, those designing their own low end processor have adopted it.

For example, nVidia uses RISC V extensively throughout its technology. GPUs contain tiny embedded CPUs to manage things internally. They have ARM licenses, but they don't want to pay the pennies it would cost for every unit that ARM charges. Likewise, Western Digital (a big hard-drive maker) designed a RISC V core for its drives. 

There are a lot of RISC V fans due to the RISC cult who insist it should go everywhere, but it's not going anywhere for high-end processors. At the high-end, you are going to pay licensing fees for designs anyway. In other words, while big companies have the resources to design small in-order processors, they don't have the resources to design big OoO processors, and would therefore buy designs from others.

Amazon's AWS Graviton is a good example (ARM-based servers). They aren't licensing the instruction-set from ARM so much as the complete OoO CPU design. They include the ARM cores on a chip of Amazon's design, having memory, I/O, security features tailored to AWS use cases. Neither the instruction-set architecture or micro-architecture particularly matter to Amazon compared to all the other features of their chips.

Lots of big companies are getting into the custom CPU game, licensing ARM cores. Big tech companies tend to have their own programming language, their own operating systems, their own computer designs, and nowadays their own CPUs. This includes Microsoft, Google, Apple, Facebook, and so on. The advantage of ARM processors (or in the future, possibly RISC V processors) isn't their RISC nature, or their instruction-sets, but the fact they are big processor designs that others can included with their own chips. There is no inherent power efficiency or speed benefit -- only the business benefit.

Conclusion

This blogpost is in reaction to that blogpost I link above. That writer just recycles old RISC rhetoric of the past 30 years, like claiming it's a "design philosophy". It, it was a set of tradeoffs meaningful to small in-order chips -- the best way of designing a chip with 100,000 transistors.

The term "RISC" has been obsolete for 30 years, and yet this nonsense continues. One reason is the Penisy textbook that indoctrinates the latest college students. Another reason is the political angle, people hating whoever is dominant (in this case, Intel on the desktop). People believe in RISC, people evangelize RISC. But it's just a cult, it's all junk. Any conversation that mentions RISC can be improved by removing the word "RISC".

OoO has replaced RISC as the dominant architecture for CPUs, and it did so in 1995, and ever since then, the terminology "RISC" is obsolete. The only thing you care about when looking at chips is whether it's an in-order design or an out-of-order design. Well, that's if you care about theory. If you care about practice, you care about whether it supports your legacy tooling and code. In the real world, whether you use x86 or ARM or MIPS or PowerPC is simply because of legacy market conditions. We still launch rockets to Mars using PowerPC processors because that's what the market for radiation-hardened CPUs has always used.

Uncategorized
In this blogpost, I describe the Synology DS620slim. Mostly these are notes for myself, so when I need to replace something in the future, I can remember how I built the system. It's a "NAS" (network attached storage) server that has six hot-swappable bays for 2.5 inch laptop drives.

That's right, laptop 2.5 inch drives. It makes this a tiny server that you can hold in your hand.

The purpose of a NAS is reliable storage. All disk drives eventually fail. If you stick a USB external drive on your desktop for backups, it'll eventually crash, losing any data on it. A failure is unlikely tomorrow, but a spinning disk will almost certainly fail some time in the next 10 years. If you want to keep things, like photos, for the rest of your life, you need to do something different.

The solution is RAID, an array of redundant disks such that when one fails (or even two), you don't lose any data. You simply buy a new disk to replace the failed one and keep going. With occasional replacements (as failures happen) it can last decades. My older NAS is 10 years old and I've replaced all the disks, one slot replaced twice.

This can be expensive. A NAS requires a separate box in addition to lots of drives. In my case, I'm spending $1500 for a 18-terabytes of disk space that would cost only $400 as an external USB drive. But amortized for the expected 10+ year lifespan, I'm paying $15/month for this home system.

This unit is not just disk drives but also a server. Spending $500 just for a box to hold the drives is a bit expensive, but the advantage is that it's also a server that's powered on all the time. I can setup tasks to run on regular basis that would break if I tried to regularly run them on a laptop or desktop computer.

There are lots of do-it-yourself solutions (like the Radaxa Taco carrier board for a Raspberry Pi 4 CM running Linux), but I'm choosing this solution because I want something that just works without any hassle, that's configured for exactly what I need. For example, eventually a disk will fail and I'll have to replace it, and I know now that this is something that will be effortless when it happens in the future, without having to relearn some arcane Linux commands that I've forgotten years ago.

Despite this, I'm a geek who obsesses about things, so I'm still going to do possibly unnecessary things, like upgrading hardware: memory, network, and fan for an optimized system. Here are all the components of my system:
You can save a bunch of money by going down to 4TB drives (and a 14TB backup USB drive), but I chose the larger 5TB drives.

Disk Drives

The most important reason for choosing this product is the smaller 2.5-inch disk drives (sized for laptops). Otherwise, you should buy one of the larger (much larger) system that'll holder standard sized drives.

The drives will be largest cost. A 5TB spinning disk costs ~$150, or an 8TB SSD flash costs ~$700. Buying 6 of them is your largest investment. You don't have to fill up the system, or buy the largest drives, but if you put in the time and effort, you might as well go all the way. On a cost-per-gigabyte, the larger drives seem to be best price.

As you know, there are only three manufacturers remaining for spinning rust drives: Seagate, Western Digital (WD), and Toshiba. Also as you know, laptops have moved away from rotating disks, adopting SSDs instead. Thus, the 2.5 inch form factor for spinning disks is likely dead. For right now, they are a lot cheaper than SSDs, a fifth of the price. In the future, when a drive dies on the array, I'll likely have to replace it with an SSD, because a replacement spinning disk is no longer available. The SATA SSD itself is eventually going to disappear (to be replaced by NVMe SSDs), but they should still be around a decade from now when I need replacement drives. (I plan on the NAS lasting a decade before I have to upgrade and move the data).

The internal 5TB drives are a bit expensive. One strategy would be to instead buy external USB drives and "shuck" them, removing the USB enclosure to get at the drives themselves. It's a common strategy when under certain market conditions, external drives are cheaper than internal drives. I tried buying a $100 5TB Western Digital external drive. It didn't work -- it wasn't a SATA drive in a USB enclosure, but was natively USB on the circuit board. I'm using it as a Raspberry Pi 4 drive instead for storing blockchain info.

Inserting the drive into the 620slim is easy: just pop out the carrier, add the drive, and pop it back in. The carrier comes with little posts on one side that fit the screw holes, meaning you only need to screw in the other side with 2 screws -- or you can forgo the screws altogether.

The carriers have locks, to prevent people from accidentally pulling out a drive, but I don't use them. In 5 years when a drive fails and I need to replace it, I don't want to go hunting for these keys. The entire strategy I'm using here is that when failure happens, I'll fix it right away rather than finding reasons to procrastinate. I've had to replace 3 failed drives in my previous NAS, and this worked well.

Memory

The DS620slim comes with 2-gigabytes of memory, in a single SO-DIMM slot. There's a second empty SO-DIMM slot. (SO-DIMMs are the smaller form factor for memory that's intended for notebook computers and tiny servers).

Synology will officially sell you a 4-gig SO-DIMM to put in the empty slot, bringing total memory to 6-gigs.

Unofficially, you can get two of these, using the second to replace the existing 2-gigs, brining it to 8-gigs total.

Even more unofficially, you can go to 16gigs. According to Intel's official spec sheet for the J3355 CPU, it only supports 8-gigs. Such numbers are usually conservative, reflecting the memory available at the time. When larger capacities appear later, they usually work. Such is the case here, where I put in 16-gigs total using Crucial SO-DIMMs (two 8-gig DIMMs).

I recommend expanding memory here, if only an extra 2gig DIMM to fill that free space. It's a quick and easy replacement, just unscrew the bottom plate and insert the memory.

Ethernet

The unit only comes with gigabit Ethernet. This can be a bottleneck, so we want to speed that up.

It comes with two Ethernet ports, which support aggregation, but I couldn't get a speed increase. It seems they'll speed things up if there are at least two devices talking to the NAS, but won't speed up when there's only one client. But then, if you have two clients, then things will slow down anyway, because accesses are no longer sequential.

The solution is to use a faster Ethernet adapter, like 2.5gig, 5gig, or 10gig. There's no PCIe slot in the device, but it does have USB 3. I can therefore use a 2.5gbps or 5gbps dongle.

I benchmarked the three options, and found the following performance, in mbps (mega-bits per second). This was measured with large sequential transfers, small or random transfers are roughly the same speed, around 350mbps, for all three adapters.


There's a big jump in performance using the 2.5gbps adapter, but only a marginal increase using the 5gbps adapter.

Synology doesn't support the adapters directly. To install them, I used the following steps with the following project:
  1. Enable SSH, using (Control Panel -> Terminal). If you are a geek, you've already done this.
  2. Go to this GitHub project and download the the r8152-apollolake-2.15.0-5.spk file (from the Releases section) to your local computer. Your DS620slim has an Apollo Lake CPU, so that's the package we are using.
  3. Use the "Package Center" to do a "Manual" install, and upload this SPK file. If you get an error saying you don't have permissions, log out and back in. Otherwise, you'll first get a warning saying the driver isn't supported by Synology, and eventually you'll get the error "Failed to install package". This is supposed to happen.
  4. From the SSH command-line, run the command:
  5. sudo install -m 4755 -o root -D /var/packages/r8152/target/r8152/spk_su /opt/sbin/spk_su
  6. Now repeat the step using "Packet Center" to do a "Manual" install. If you didn't close the window that you had open, you can just click on the "Done" button a second time and it'll work.
  7. Now reboot, and plug in the USB adapter.
For 5-gbps, you can use go through the same process to install Aquantia aqc111 drivers. I did this to get a Sabrent NT-SS5G adapter to work.

In practice, when transferring large files, you still aren't going to be able to exceed 2.5gbps much, so I just use the slower adapter. It's cheaper and uses a lot less electrical power (a 2.5gbps Ethernet adapter is noticeably cooler than a 5gbps, which is in turn noticeably cooler than 10gbps).

Fan

The unit comes with a small fan that by default will run in "quiet" mode, but under load, the noise becomes noticeable. A cheap $15 gets a fan that runs a lot quieter, like a Noctua fan famous for this. Replacing the fan doesn't require any tools, as it's held in by rubber thingies.

This allows me to run the fan at a higher speed, with less noise, which keeps everything even cooler. Since I plan on a 10 year lifespan with rotating disks, I figure lower temperatures will be better for longevity.

USB drive backups

RAID6 gives pretty good safety, allowing two drives to fail with no data loss. The term "RAID5" means one redundant disk, the term "RAID6" means two redundant disks.

But you should still do backups. The NAS itself can fail. Or, ransomware can delete all the files. There's lots of possible failures.

One of the neat things with Synology is that it's easy to schedule regular backups to an external USB drive.

In my case, I'm using an 18 terabyte USB drive costing $400 for backups. I just schedule it and forget it, backups always happen, and ransomware on Windows machines can delete everything on the NAS but can't touch the backup.

UPS (Uninterruptable Power Supply)

For a small NAS, I bought a small UPS. This is some weird APC unit that I got on close-out for $100. It's such a weird little product that I don't think it was very popular.

It's a lithium ion UPS. The price for lithium batters, especially LiFePO4, is approaching the point where they are price competitive with traditional lead acid batteries. This is especially true considering that they last longer in UPS applications than lead acid.

File system

Now with hardware out of the way, let's talk software. Once you insert the drives, plug in the Ethernet, and turn on the power, you access the device with a web browser and configure from there.

There are several choices for how you want to configure RAID and the filesystem.

I chose BTRFS on top of RAID6.

BTRFS is a new Linux filesystem that's increasingly becoming the default. It's major feature is that it includes checksums for files as part of their metadata (along with filenames and timestamps). This allows the filesystem to detect when a file has become corrupted, so that the file can be repaired. Bits will rot on hard disk, so files can become corrupted over time even if the files are never written to or read. Scrubbing prevents this from happening. With Synology, I simply configure it to scrub the entire filesystem every month.

This is not "btrfs-raid", but "btrfs-on-raid6". BTRFS has some experimental RAID built-in, but it's buggy and doesn't really work. Instead, I first create a RAID6 array combining multiple drives into a single virtual drive, then put BTRFS on top of that.

These boxes are designed to allow multiple filesystems to be created, but I create simply the one. I do have multiple "shares", though, such as for videos and music, but these are still just directories on the same filesystem.

I also occasionally take "snapshots". I'm not sure how that works since I've never restored a snapshot, but in principle it'll be quicker restoring from backups.

Summary

If you are looking for between 16TB and 20TB, for more personal use than a large office, it's rather perfect. Yea, it'll be 4 times more expensive than just getting an external USB drive, but it's RAID and it's own server.

It's so cute I got a second one and filled it with 2TB SSDs, for database accesses that spend a lot of time searching through large database of poorly indexed data (like password dumps).

Uncategorized

For the Beijing 2022 Winter Olympics, the Chinese government requires everyone to download an app onto their phone. It has many security/privacy concerns, as CitizenLab documents. However, another researcher goes further, claiming his analysis proves the app is recording all audio all the time. His analysis is fraudulent. He shows a lot of technical content that looks plausible, but nowhere does he show anything that substantiates his claims.

Average techies may not be able to see this. It all looks technical. Therefore, I thought I'd describe one example of the problems with this data -- something the average techie can recognize.

His "evidence" consists screenshots from reverse-engineering tools, with red arrows pointing to the suspicious bits. An example of one of these screenshots is this on:


This screenshot is that of a reverse-engineering tool (Hopper, I think) that takes code and "disassembles" it. When you dump something into a reverse-engineering tool, it'll make a few assumptions about what it sees. These assumptions are usually wrong. There's a process where the human user looks at the analyzed output, does a "sniff-test" on whether it looks reasonable, and works with the tool until it gets the assumptions correct.

That's the red flag above: the researcher has dumped the results of a reverse-engineering tool without recognizing that something is wrong in the analysis.

It fails the sniff test. Different researchers will notice different things first. Famed google researcher Tavis Ormandy points out one flaw. In this post, I describe what jumps out first to me. That would be the 'imul' (multiplication) instruction shown in the blowup below:


It's obviously ASCII. In other words, it's a series of bytes. The tool has tried to interpret these bytes as Intel x86 instructions (like 'and', 'insd', 'das', 'imul', etc.). But it's obviously not Intel x86, because those instructions make no sense.

That 'imul' instruction is multiplying something by the (hex) number 0x6b657479. That doesn't look like a number -- it looks like four lower-case ASCII letters. ASCII lower-case letters are in the range 0x61 through 0x7A, so it's not the single 4-byte number 0x6b657479 but the 4 individual bytes 6b 65 74 79, which map to the ASCII letters 'k', 'e', 't', 'y' (actually, because "little-endian", reverse order, so "ytek").

No, no. Techies aren't expected to be able to read hex this way. Instead, we are expected to recognize what's going on. I just used a random website to interpret hex bytes as ASCII.

There are 26 lower case letters, roughly 10% of the 256 possible values for a byte. Thus, the chance that a random 4 byte number will consist of all lower-case letters is 1-in-10,000. Moreover, multiplication by strange constants happens even more rarely. You'll commonly see multiplications by small numbers like 48, or large well-formed numbers like 0x1000000. You pretty much never see multiplication by a number like 0x6b657479, baring something rare like an LCG.

QED: this isn't actually an Intel x86 imul instruction, it's ASCII text that the tool has tried to interpret as x86 instructions.

Conclusion

At first glance, all those screenshots by the researcher look very technical, which many will assume supports his claims. But when we actually look at them, none of them support his claims. Instead, it's all just handwaving nonsense. It's clear the researchers doesn't understand them, either.


Uncategorized

The reason you don't really understand NFTs is because the journalists describing them to you don't understand them, either. We can see that when they attempt to sell an NFT as part of their stories (e.g. AP and NYTimes). They get important details wrong.

The latest is Reason.com magazine selling an NFT. As libertarians, you'd think at least they'd get the technical details right. But they didn't. Instead of selling an NFT of the artwork, it's just an NFT of a URL. The URL points to OpenSea, which is known to remove artwork from its site (such as in response to DMCA takedown requests).

If you buy that Reason.com NFT, what you'll actually get is a token pointing to:

https://api.opensea.io/api/v1/metadata/0x495f947276749Ce646f68AC8c248420045cb7b5e/0x1F907774A05F9CD08975EBF7BF56BB4FF0A4EAF0000000000000060000000001

This is just the metadata, which in turn contains a link to the claimed artwork:

https://lh3.googleusercontent.com/8Q2OGcPuODtCxbTmlf3epFGOqbfCbs4fXZ2RcIMnLpRdTaYHgqKArk7uETRdSZmpRAFsNE8KB4sFJx6czKE5cBKB1pa7ovc4wBUdqQ

If either OpenSea or Google removes the linked content, then any connection between the NFT and the artwork disappears.

It doesn't have to be this way. The correct way to do NFT artwork is to point to a "hash" instead which uniquely identifies the work regardless of where it's located. That $69 million Beeple piece was done this correct way. It's completely decentralized. If the entire Internet disappeared except for the Ethereum blockchain, that Beeple NFT would still work.

This is an analogy for the entire blockchain, cryptocurrency, and Dapp ecosystem: the hype you hear ignores technical details. They promise an entirely decentralized economy controlled by math and code, rather than any human entities. In practice, almost everything cheats, being tied to humans controlling things. In this case, the "Reason.com NFT artwork" is under control of OpenSea and not the "owner" of the token.

Journalists have a problem. NFTs selling for millions of dollars are newsworthy, and it's the journalists place to report news rather than making judgements, like whether or not it's a scam. But at the same time, journalists are trying to explain things they don't understand. Instead of standing outside the story, simply quoting sources, they insert themselves into the story, becoming advocates rather than reporters. They can no longer be trusted as an objective observers.

From a fraud perspective, it may not matter that the Reason.com NFT points to a URL instead of the promised artwork. The entire point of the blockchain is caveat emptor in action. Rules are supposed to be governed by code rather than companies, government, or the courts. There is no undoing of a transaction even if courts were to order it, because it's math.

But from a journalistic point of view,  this is important. They failed at an honest description of what actually the NFT contains. They've involved themselves in the story, creating a conflict of interest. It's now hard for them to point out NFT scams when they themselves have participated in something that, from a certain point of view, could be viewed as a scam.



Uncategorized

Tina Peters, the election clerk in Mesa County (Colorado) went rogue and dumped disk images of an election computer on the Internet. They are available on the Internet via BitTorrent [Mesa1][Mesa2], The Colorado Secretary of State is now suing her over the incident.

The lawsuit describes the facts of the case, how she entered the building with an accomplice on Sunday, May 23, 2021. I thought I'd do some forensics on the image to get more details.

Specifically, I see from the Mesa1 image that she logged on at 4:24pm and was done acquiring the image by 4:30pm, in and (presumably) out in under 7 minutes.

In this blogpost, I go into more detail about how to get that information.


The image

To download the Mesa1 image, you need a program that can access BitTorrent, such as the Brave web browser or a BitTorrent client like qBittorrent. Either click on the "magnet" link or copy/paste into the program you'll use to download. It takes a minute to gather all the "metadata" associated with the link, but it'll soon start the download:

What you get is file named EMSSERVER.E01. This is a container file that contains both the raw disk image as well as some forensics metadata, like the date it was collected, the forensics investigator, and so on. This container is in the well-known "EnCase Expert Witness" format. EnCase is a commercial product, but its container format is a quasi-standard in the industry.

Some freeware utilities you can use to open this container and view the disk include "FTK Imager", "Autopsy", and on the Linux command line, "ewf-tools".

However you access the E01 file, what you most want to look at is the Windows operating-system logs. These are located in the directory C:\Windows\system32\winevtx. The standard Windows "Event Viewer" application can load these log files to help you view them.

When inserting a USB drive to create the disk image, these event files will be updated and written to that disk before the image was taken. Thus, we can see in the event files all the events that happen right before the disk image happens.


Disk image acquisition

Here's what the event logs on the Mesa1 image tells us about the acquisition of the disk image itself.

The person taking the disk image logged in at 4:24:16pm, directly to the console (not remotely), on their second attempt after first typing an incorrect password. The account used was "emsadmin". Their NTLM password hash is 9e4ec70af42436e5f0abf0a99e908b7a. This is a "role-based" account rather than an individual's account, but I think Tina Peters is the person responsible for the "emsadmin" roll.

Then, at 4:26:10pm, they connected via USB a Western Digital  "easystore™" portable drive that holds 5-terabytes. This was mounted as the F: drive.

The program "Access Data FTK Imager 4.2.0.13" was run from the USB drive (F:\FTK Imager\FTK Imager.exe) in order to image the system. The image was taken around 4:30pm, local Mountain Time (10:30pm GMT).

It's impossible to say from this image what happened after it was taken. Presumably, they immediately hit "eject" on the drive, logged off, and disconnected the hard drive. Thus from beginning to end, it took about 7 minutes to take the image once they sat down at the computer.


Dominion Voting Systems

The disk image is that of a an "EMS Server" part of the Dominion Voting Suite. This is a server on an air-gapped network (not connected to any other network) within the count offices.

Most manuals for Colorado are online, though some bits and pieces are missing, and can be found in documents posted to other state's websites (though each state does things a little different, so such cross referencing can't be completely trusted).

The locked room with an air-gapped network  you see in the Mesa County office appears to look like the following, an "EMS Standard" configuration (EMS stands for Election Management System).


This small network is "air gapped", meaning there is no connection from this network to any other network in the building, nor out to the Internet. By looking at the logs from the Mesa1 image, we can see what this network looks like:
  • The EMS Server is named "EMSERVER" with IP address 192.168.100.10 and MAC address 44-A8-42-30-01-5D. The hard drive matches Dominion's specs: a 1-terabyte boot drive (C:) and a 2-terabyte data drive (D:) that is shared with the rest of the network as \\EMSERVER\NAS. This also acts as the network's DHCP and DNS server.
  • At least one network printer, model Dell E310dw.
  • Two EMS Workstations (EMSCLIENT01 and EMSCLIENT02). This is where users spend most of their time, before an election to create the ballots, and after all the ballots have been counted to construct the final tally.
  • Four ImageCast Central (ICC) (ICC01 - ICC04) scanners, for automatically scanning and tabulating ballots.
  • Two Adjudication Workstations (ADJCLIENT01 and ADJCLIENT03). These are used when the scanners reject ballots, such as when somebody does a write-in candidate, or marks two candidates. Humans need to get involved to make the final judgement on what the ballot actually says.

Note this isn't the machines you'd expect to see in a precinct when you vote (which would be "ballot marking devices" predominantly). These are the machines in the back office that count the votes and store the official results.


Conclusion

What we see here is that "system logs" can tell us a lot of interesting things about the system. There's good reason to retain them in the future.

On the other hand, they generally can't answer the most important question: whether the system was hacked and votes flipped.

Mike Lindell claims to have "Absolute Proof" that Chinese hackers flipped votes throughout the country, including Maricopa County. If so, this would've been the system that the Chinese hackers would've hacked. Yet, in the system image, there is no evidence of this. By this, I mean the Mesa1 image, the one from before the system logs were deleted (obviously, there would be nothing in the Mesa2 image).

This lack of hacking evidence in the logs isn't proof that it didn't happen, though. The fact is, the logs aren't comprehensive enough to record most hacks, and the hackers could've deleted the logs anyway. That's why system logs aren't considered "election records" and that laws don't mandate keeping them: they could have some utility, as I've shown above, but they really wouldn't show the things that we most want to know.






Uncategorized

The Alfa-Trump conspiracy-theory has gotten a new life. Among the new things is a report done by Democrat operative Daniel Jones [*]. In this blogpost, I debunk that report.

If you'll recall, the conspiracy-theory comes from anomalous DNS traffic captured by cybersecurity researchers. In the summer of 2016, while Trump was denying involvement with Russian banks, the Alfa Bank in Russia was doing lookups on the name "mail1.trump-email.com". During this time,  additional lookups were also coming from two other organizations with suspicious ties to Trump, Spectrum Health and Heartland Payments.

This is certainly suspicious, but people have taken it further. They have crafted a conspiracy-theory to explain the anomaly, namely that these organizations were secretly connecting to a Trump server.

We know this explanation to be false. There is no Trump server, no real server at all, and no connections. Instead, the name was created and controlled by Cendyn. The server the name points to for transmitting bulk email and isn't really configured to accept connections. It's built for outgoing spam, not incoming connections. The Trump Org had no control over the name or the server. As Cendyn explains, the contract with the Trump Org ended in March 2016, after which they re-used the IP address for other marketing programs, but since they hadn't changed the DNS settings, this caused lookups of the DNS name.

This still doesn't answer why Alfa, Spectrum, Heartland, and nobody else were doing the lookups. That's still a question. But the answer isn't secret connections to a Trump server. The evidence is pretty solid on that point.


Daniel Jones and Democracy Integrity Project

The report is from Daniel Jones and his Democracy Integrity Project.

It's at this point that things get squirrely. All sorts of right-wing sites claim he's a front for George Soros, funds Fusion GPS, and involved in the Steele Dossier. That's right-wing conspiracy theory nonsense.

But at the same time, he's clearly not an independent and objective analyst. He was hired to further the interests of Democrats.

If the data and analysis held up, then partisan ties wouldn't matter. But they don't hold up. Jones is clearly trying to be deceptive.

The deception starts by repeatedly referring to the "Trump server". There is no Trump server. There is a Listrak server operated on behalf of Cendyn. Whether the Trump Org had any control over the name or the server is a key question the report should be trying to prove, not a premise. The report clearly understands this fact, so it can't be considered a mere mistake, but a deliberate deception.

People make assumptions that a domain name like "trump-email.com" would be controlled by the Trump organization. It's wasn't. When Trump Hotels hired Cendyn to do marketing for them, Cendyn did what they normally do in such cases, register a domain with their client's name for the sending of bulk emails. They did the same thing with hyatt-email.com, denihan-email.com, mjh-email.com, and so on. What clear is that the Trump organization had no control, no direct ties to this domain until after the conspiracy-theory hit the press.


Finding #1 - Alfa Bank, Spectrum Health, and Heartland account for nearly all of the DNS lookups for mail1.trump-email.com in the May-September timeframe.

Yup, that's weird and unexplained.

But it concludes from this that there were connections, saying the following:

In the DNS environment, if "computer X" does a DNS look-up of "Computer Y," it means that "Computer X" is trying to connect to "Computer Y".

This is false. That's certainly the assumption we usually make, that it's probably true in most cases. But it's not something we insist upon if there's reason to doubt it. And since there's reason to doubt it here, we would need more evidence to make that conclusion.

For example, before the contract was canceled in March 2016, there were DNS lookups for the "mail1.trump-email.com" name from all over the place. That's because the Listrak server was pumping out bulk emails ("spam") promoting Trump Hotels. Servers receiving the emails would often check the identity of the server through DNS lookups, but without any attempt to connect. This fact is footnoted in the Jones report even as it claims otherwise in the main text.

Obviously, that's no longer the case after March 2016, when the contract was canceled. But if Cendyn repurposes the server for something else, such lookups can still happen without connections. The DNS records hadn't changed. So if the server sends out new things from that IP address, unrelated to Trump Org, it'd still cause DNS lookups for the "trump-email.com" domain to happen. It wouldn't mean anybody was trying to connect to the server.

This is indeed what Cendyn claims, that they repurposed the resources for their hotel meetings app (whereby hotels can schedule conferences and things on their premises).

It's still suspicious that only those three organizations were involved, but at the same time, it's clearly false to assume this is evidence of connections.


Finding #2 - Comparison with denihan-email.com.

The Jones report compared the DNS logs of trump-email.com with the domain of another of Cendyn's client, Denihan. Cendyn registered the domain denihan-email.com. This is another hotel company.

This comparison was obviously bogus. The contract with Cendyn ended in March 2016, after which Cendyn claims it repurposed the server. Jones uses the timeframe August 2016 through September 2016 to compare traffic for those two domains. Of course they'd be different. A valid comparison would be a t timeframe before March 2016, when both were clients of Cendyn.

Since Jones documents the fact the contract between Cendyn and Trump Org was ended, they are knowingly comparing an apple to an orange. Thus, it's not a mistake but a deception.

This also points to the fundamental problem with the data-set. We don't really have a full picture of what happened, such as data going back to 2015. We have a carefully curated subset of the data designed to show just what they want us to see.

Everything points to trump-email.com domain and Listrak servers being just normal Cendyn stuff used for Cendyn's purposes. As far as we can tell, that domain worked the same as other Cendyn clients, such as denihan-email.com, hyatt-email.com, mjh-email.com, and so on. These domains are controlled by Cendyn, not their client's. Cendyn in turn points those names at Listrak servers for sending bulk email.


Finding #3 - Missing SPF record

The Jone's report points to missing SPF records, showing that the server is not configured correctly for sending mass emails. It includes this exhibit.


But a review shows that this is the same configuration as for other Cendyn/Listrak bulk email servers. For example, compared to mjh-email.com, we find it's configured the same:


The SPF and DMARC standards were not as widely used in 2016, so misconfigurations were common. Moreover, the domains also lacked a DMARC record. Without DMARC, despite SPF being bad, many receivers won't reject the emails.

Listrak/Cendyn still fail to have proper DMARC records for their clients, which means that some of their bulk email is getting rejected. They should probably fix that. This doesn't mean Listrak/Cendyn aren't in the bulk email business, only that they could be better at it.

Thus, we've shown that trump-email.com had the perfectly normal Cendyn SPF records. Far from proving this isn't a bulk email server, the consistency with Cendyn's normal configuration proves unequivocally that it is.


Finding #4 - Accepts emails only from specific senders

The Jones report shows that the server in question (66.216.133.29) accepts incoming email, but rejects email from the public, accepting email only from specific senders. They assume the specific senders would be those from Alfa Bank, Spectrum, and Heartland.

Again, they don't compare properly to other Cendyn/Listrak systems. If they had, they'd have found that they all are configured the same way. There's an entire subnet of servers you can test this way:


All these servers show the same messages, allowing incoming email connections but not incoming email messages.

This is a vestigial configuration common to bulk email senders. Spammers only send email. One way to test if somebody is spammer is to connect back. This configuration makes it appear they'll accept email even if they won't, passing the test.

In no way is this evidence of secret communications. It's not evidence of their claim that somehow Alfa Bank, Spectrum Health, and Heartland would be on the list of allowed senders. We would need additional evidence to make that claim, not an assumption.


Finding #5 - Evidence of human interaction and coordination

The report claims a direct link between Alfa and Trump with the following:
On September 23, 2016, two days after The New York Times approached Alfa Bank, the Trump Organization deleted the email server "mail1.trump-email.com" ... it would have been a deliberate human action taken by a someone working on behalf of the Trump Organization and not by Alfa Bank. An analyst, quoted in the Slate article by Franklin Foer, observed that "the knee was struck in Moscow, and the leg kicked in New York."
This 'finding' is an excellent demonstration of how to identify conspiracy-theories: anomalies that cannot otherwise be explained become proof of the conspiracy. After all, the conspiracy-theory can explain everything.

When I debunked the Alfa-Trump thing back in 2017, reporters grilled me on this specific point. They demanded I come up with an explanation for this coincidence. I told them I had none, but just because I didn't have one, it didn't mean it was proof of the conspiracy theory. There could be lots of explanations, just because we don't know them doesn't mean they don't exist. Just because the conspiracy-theory explains it doesn't mean this is evidence for the conspiracy.

But now we do have another explanation: the FBI called Cendyn on the morning of September 23 and asked them about the domain. As the agent reported back:
“Followed up this morning with Central Dynamics [Cendyn] who confirmed that the mail1.trump-email.com domain is an old domain that was set up in approximately 2009 when they were doing business with the Trump Organization that was never used." -- *
Thus, it's not NYT contacting Alfa Bank that caused the deletion, it's the FBI calling Cendyn. Thus, there's no evidence Alfa Bank or Trump Org were even involved. The evidence is quite clear that only Cendyn was involved.

After Cendyn deletes the domain "mail1.trump-email.com", lookups of that name started to fail. The Jones report notes that Alfa Bank then switched to "trump1.contact-client.com". It weaves this in to the conspiracy thusly:
The fact that Alfa Bank was the first entity (IP address) to conduct a DNS look-up for "trump1.contact-client.com" in the data-set could indicate that someone at Alfa Bank was in some manner made aware of the new Trump Organization server name.
The name "contact-client.com" is part of Cendyn's infrastructure. For their "mail1.customer-email.com" domains, there's a matching "customer1.contact-client.com" domain. We can see test that live right now:


This is totally consistent with Cendyn's re-use of the infrastructure for a new purpose, as it would treat both domain names the same. Rather than evidence suggesting human interaction, it's evidence suggesting the opposite, that there was no human interaction.


6. The Mandiant report doesn't refuted these findings

After this thing hit the news, Alfa Bank hired Mandiant to come to their offices and investigate. Their report was inconclusive. They didn't find anything.

Note the difference in language. Things Mandiant can't explain demonstrates Mandiant's incompetence, while things Jones can't explain prove the conspiracy-theory. If Mandiant's report should be treated as inclusive and proof of nothing, then so too should the Jones report. The Jones report has even less evidence than the Mandiant report.


7. The public statements by Trump et al. are contradictory and incomplete

Duh.

The Trump Org, Alfa, and Spectrum Health have no idea what happened. Their statements are consistent with knowing they don't have secret communications, but not knowing where this DNS data came from. They are unable to refute the allegations, but at the same time, are concerned for their reputations, and behave accordingly. Which, of course, means the guess at what's going on with more confidence than is warranted.

If there were secret communications among them, you'd expect they'd do a better job at coordinating their stories.


Conclusion

In this blogpost, I've refuted all the findings of the Jones report. There is still the question where this DNS anomaly came from, but the allegation that this proves a secret connect between Alfa Bank and a Trump server is clearly false.

Moreover, I've shown that the Jones report is not merely wrong, but deliberately deceptive. They repeatedly reference a "Trump Organization Server" even though it's quite clear from the text they know that no such server exists.

For example, when Cendyn removed the "mail1.trump-email.com" DNS record, it was described as the "Trump Organization deleted the email server". It's clear they know that Cendyn simply removed the mail1.trump-email.com record, and that the Listrak server wasn't touched. Yet, they deliberate phrase things this way in order to deceive.

What we have is Alfa Bank doing DNS queries. What we don't have is any connection to the Trump Org. Since Jones couldn't create the conclusion based on evidence that Trump Org was involve, he instead made it the premise.

This in turn makes it easy to disprove the entire Jones report: since there's not only no evidence of Trump Org involvement, and quite a lot of evidence Trump Org had no control over the domain or servers, it disprove the entire theory that there was secret connections with Alfa Bank.

Uncategorized

One of the most important classic sci-fi stories is the book "Dune" from Frank Herbert. It was recently made into a movie. I thought I'd write a quick review.

The summary is this: just read the book. It's a classic for a good reason, and you'll be missing a lot by not reading it.

But the movie Dune (2021) movie is very good. The most important thing to know is see it in IMAX. IMAX is this huge screen technology that partly wraps around the viewer, and accompanied by huge speakers that overwhelm you with sound. If you watch it in some other format, what was visually stunning becomes merely very pretty.

This is Villeneuve's trademark, which you can see in his other works, like his sequel to Bladerunner. The purpose is to marvel at the visuals in every scene. The story telling is just enough to hold the visuals together. I mean, he also seems to do a good job with the story telling, but it's just not the reason to go see the movie. (I can't tell -- I've read the book, so see the story differently than those of you who haven't).

Beyond the story and visuals, many of the actor's performances were phenomenal. Javier Bardem's "Stilgar" character steals his scenes. Stellan Skarsgård exudes evil. The two character actors playing the mentats were each perfect. I found the lead character (Timothée Chalamet) a bit annoying, but simply because he is at this point in the story.

Villeneuve's splits the book into two parts. This movie is only the first part. This presents a problem, because up until this point, the main character is just responding to events, not the hero who yet drives the events. It doesn't fit into the traditional Hollywood accounting model. I really want to see the second film even if the first part, released in the post-pandemic turmoil of the movie industry, doesn't perform well at the box office.

In short, if you haven't read the books, I'm not sure how well you'll follow the storytelling. But the visuals (seen at IMAX scale) and the characters are so great that I'm pretty sure most people will enjoy the movie. And go see it on IMAX in order to get the second movie made!!


Uncategorized