In 1991 the League for Programming Freedom published several articles opposing software patents. It's eight years later, and nothing has changed--or rather, the situation has merely gotten worse.
One problem I have with the League for Programming Freedom's articles is that they take the tactic of attacking software patents on a number of fronts--a classic debating tactic, since if one attack "fails", the other attacks still stand.
But I feel that this multi-pronged attack weakens the central argument, which is present but not made the focus of those articles. So I will lay it out here--but the LPF's papers already called attention to most of this.
The real reason we should not have software patents is because a patent infringes on other people's freedoms. A patent holder can charge arbitrary amounts of money for a patent license, or simply withhold it altogether, preventing a company from using a particular technology.
Of course, this is exactly what patents are there to do. It is important to recall, however, that patents are not granted to inventors because there is a fundamental right to the ownership of new ideas. They are granted because the government believes it provides an incentive to invent--that it is a net improvement for the public good.
The software programming community seems largely against patents. When the only people defending software patents are patent holders who make money from licensing, patent lawyers who make money from patent applications and patent litigation, and the Patent and Trademark Office (who make money from granting patents), is there any reason to believe they are in the public interest? (Oh, there's a fourth category--companies which say they are only acquiring patents so as to be able to defend themselves against more aggressively litigous companies.)
If software patents were only granted to people who had invested a significant amount of work, and who had really invented something new, I would have a lot more trouble marshalling a real argument against them. But that is not what they are being granted for. At least if the ones that didn't meet that standard fell down in court it would be vaguely tolerable. The problem is that the only plausible court defense seems to be prior art, and patents may well have no prior art, and yet be trivial--it all depends on what one accepts as prior art.
For example, it is trivial in software development to take an algorithm from some other problem domain and use it to solve a new, different problem. This is normally so trivial it shouldn't be patentable. But if the first person to do this patents it, there will be no prior art in this strict sense. There is prior art showing the technique being used in another problem domain, but this is often not considered applicable.
Software is different from most domains of patenting. Software is extremely adaptive and combinatorial. Software works through abstract processes on abstract entities. It doesn't need to be tested against the real world. The fundamental design process of software is one of modularity and reuse. Software is at heart designed to be a set of cooperating pieces. The better the design, the more the pieces will be independent components, as minimally dependent on the others as possible. These are all reasons to think that the standards for "trivial" might be different in software.
Here then, at heart, are the sorts of problems with software patents that directly relate to how they affect practitioners of the art, and why patents are inappropriate. Many of these are interrelated, and can be perceived as alternate statements of the same problem.
As I indicated at the start of this section, the amount of effort invested in the techniques found in most software patents is insufficient to be deserving of a patent, based on the 'public good' standard: If I, as an expert practitioner in the subject area, could come up with the same solution after about eight hours of work, what is the reason for giving someone else the right to charge me $1,000,000 for reinventing their "technology"?
Even if it were to take me a week to reinvent the technique, the numbers just don't balance out reasonably.
What may confuse some people is that software development doesn't take a week. But software ideas don't need to be tested against the real world. They do have to be tested against themselves--there are certainly cases of software ideas that don't turn out to work-- and they may have to get tested against real-world data (e.g. a compression algorithm). But most of the software development process is not about refining the idea to make it do the job correctly, or sifting through a large number of engineering alternatives to find one that really works--most of the software development process is about manipulating other components while leaving the original core idea unchanged. Thus, the patent is granted for a small amount of work.
A common pro-patent argument against this issue is that the development cost isn't central--the goal is to award a monopoly so as to provide an incentive for this innovation in the first place. But the reality is that this sort of innovation occurs all the time, regardless of a patent incentive. There are cases of a little guy with a patent which protects his innovative "neat idea" from being ripped off by a big company, certainly; but they are very rare in the world of software. Most such things would be trivially protected by being a trade secret, and the open market already provides the motivation for innovation. (The biggest place where trade secrets do not help is when the subject matter is directly exposed to users, such as in the case of user interface design.)
Software is designed to be modular and to allow easy combination. A core programming concept is that of "data structure augmentation"--taking a basic data structure and adding extra data or another data structure to it.
Patents are granted to "inventors" who have chosen to combine two elements in a particular way that either nobody else has ever done, or which, at least, nobody has ever published. Courts have rejected prior art on the ground that it lacks an appropriate combination, even though such combination is trivial to practioners in the field.
The "backing store" patent covers the use of an obvious, familiar technique--copying data that is overwritten into a temporary storage area for later restoration.
There is no invention here. Deciding to use this algorithm for backing store has to be a novelty once--somebody has to do it first--but that doesn't make it inventive. In fact, if the patent had attempted a broader claim, it probably wouldn't have succeeded. Of course the above technique has been used in numerous other applications--for example in the creation of sentinels for linear search. But it had even been used in very similar problem domains: video games had already been using the identical algorithm to support animated bitmaps--saving and restoring the contents of the image underneath the animated object as it moved around the screen. Some windowing systems of the same time may have used this algorithm for drawing cursors to the screen.
There is novelty in using this algorithm for window restoration, but should it be patentable? The patent is being granted for the idea, which takes sixty seconds for someone to come up with. The implementation will take a lot longer, and there may even turn out to be tricky bits (e.g. correctly updating backing store when updating windows onscreen), but there's no trial and refinement of the process against the real world--nothing to learn or invent that actually would help other practioners trying to implement it. Publishing this information (which is what the one good the patent is supposed to provide) simply saves people that sixty seconds of work.
There is much discussion in the anti-software-patent literature about certain patents which seem to be widely interpreted as basically covering "any algorithm which uses this general set of techniques", for example some public key crypotgraphy patents. Of course, the legal language of the patent may not formally make such a broad claim, but the issue is if the patent is enforced in this manner and if the courts stand by and allow it to be--and the fact that most small software developers are innovative, but do not hold patents, and may not be able to afford a patent defense, so the mere threat of it being used in this way is enough to prevent such variants.
Closely related is a patent which apparently covers results, not process. The exclusive-or-cursor patent is a simple example of this. There are a number of mathematically equivalent ways of expressing the xor algorithm, but it appears to be widely believed that all of them are covered by the patent. The patent apparently is understood to cover the idea of representing an onscreen cursor by complementing each "visible" pixel of the cursor with the contents of whatever is on screen.
This is somewhat surprising, since it is predated by hardware implementations that do the exact same thing for text-only modes--rendering a block or underline cursor so its appearance is the same as above.
Let me show just the most trivial example of how there are multiple ways of accomplishing the same thing. Here are four implementations of block-cursor rendering for a 1-bit display:
for (y=0; y < y_height; ++y) { int loc = (y+y_off)*stride + x_off; vidmem[loc] ^= 255; /* XOR */ } for (y=0; y < y_height; ++y) { int loc = (y+y_off)*stride + x_off; vidmem[loc] = 255 - vidmem[loc]; /* subtraction */ } for (y=0; y < y_height; ++y) { int loc = (y+y_off)*stride + x_off; vidmem[loc] = ~vidmem[loc]; /* subtraction */ } for (y=0; y < y_height; ++y) { int loc = (y+y_off)*stride + x_off; for (x=0; x < 8; ++x) if (vidmem[loc] & (1 << x)) vidmem[loc] &= ~(1 << x); else vidmem[loc] |= (1 << x); }
Of course, all of these have exactly the same effect. Therefore, since computer programs are just math, these four are considered in the computing industry to be mathematically equivalent. Thus, none of them can really be considered to 'work around' an imaginary XOR patent which patents the first one. But the fourth algorithm is adaptable to the actual xor-cursor patent, since it can be modified only to update selected bits (the ones specified in a non-block cursor). Doing this to work around the patent would be bizarre, however, since the patented implementation is a natural and obvious implementation, leveraging the capabilities of the computer, and the workaround is grotesque, unnatural, and inefficiently uses the resources of the computer.
Is deciding to support an arbitrary pixel map for a cursor patentable? Is deciding to render that arbitrary pixel map as a logical complement of the screen pixel map patentable, given that was already being done for block cursors? Is the use of exclusive-or to achieve that effect patentable?
Of these, to me, the only clever idea was the original "render the block cursor by inverting what's already there", and the rest are all simple, small innovations that lots of people would have thought of with investment of a trivial amount of time. None of those innovations deserve patent protection.
Another reason patents are often trivial is because they offer a solution to a problem which only really has two or three plausible solutions--that is, plausible within the set of constraints imposed by most programming situations.
In this case, then, it's easy to believe that the patented solution is trivial, simply because anyone trying to solve the problem would come up with that approach. This is often the case when programmers learn that something familiar to them is patented. Patent advocates argue that programmers are forgetting that the familiar subject is familiar to them with experience, and that it really was an invention at the time. But programmers understand that if nobody has come up with a better way in ten years, then anyone tackling the problem would have come up with that solution with sufficient time. And usually, that solution is small enough (e.g. backing store or the exclusive-or cursor) that it's unreasonable to believe it would have taken significant effort.
This is not to say that there are no real inventions. The GUI was clearly a significant invention. It was a novel idea, and one which had to actually be implemented and tested against the real world of human beings trying to use it before it could be seen whether or not it was a valid idea. No doubt I could come up with a list of ten, maybe a hundred such ideas of clear novelty and non-obviousness. But not a hundred thousand.
A crucial problem with prior art is that somebody has to do something first. If the only acceptable prior art is somebody doing the exact same thing--if the PTO and the courts fail to recognize that techniques are trivial, obvious variations on old examples--then the person who does it first will always be able to get a patent if they file fast enough.
This is particular bad with software patents, because hardware evolves at a rapid pace. People don't bother attempting to solve problems in software until the appropriate hardware technology is available. For example, the League for Programming Freedom argues that backing store was trivial and well-known, but unpublished because it was trivial. While a few people were actually already using it at the time of filing, however, the lack of prior art is also clearly attributable to the fact that nobody was doing it a few years earlier--but this is due to the fact that the hardware wasn't there to support it, not because the idea was a noteworthy invention.
Sometimes the reason a patent is novel is because you're trying to solve a problem at all. Once you have that idea, there's a natural application of existing ideas to solve that problem, or a natural approach. But most of the time such an idea is undeserving of protection, because someone else would have tried to solve the problem. Often somebody else is actively solving the problem before the time the patent is disclosed.
A large number of "do this same old thing, but do it on the Internet" are probably examples of this sort of thing. (Typically, doing something over the Internet does require real changes to the 'same old thing', but those changes are almost always well-known or obvious changes necessary to make the same-old-thing work.)
A related problem is when an algorithm is derived as part of a proof process. For example, a person attempting to optimize an algorithm in assembly language can identify the crucial operations which much be accomplished, identify constraints between the operations, and derive a limit on the rate at which any code could possible execute that algorithm. A particular implementation is likely to fall out of that proof process.
Should such an implementation be patentable? It's not clear to me it should even be copyrightable. The revealing thought-experiment is to consider a practioner who optimizes the code without doing the proof, but evolves such a solution and then patents it.
I'm going to do something risky. I'm going to describe a process and try to indicate why it is trivial. The risk here is that a pro-patent reader will reject it out of hand--"of course the process you're describing is trivial, but the ones that are patented are not". I hope that any reader going into this with her eyes open will try to evaluate it honestly.
Most importantly, I'm not writing this as if it were a patent application. It is not my goal to try to show how language can obfuscate simplicity. My goal is to try to show how everyday programming involves creativity on a small scale--a scale on which it is inappropriate to patent--and yet for which patents are being granted.
I have a grid of numbers. To use compute programming terms, I have a matrix M[x,y], where x and y are bounded over some finite range.
My program has extremely constrained storage; my program can only store a single digit for the number found at location M[x,y].
Here is the problem: I wish to temporarily update elements of the matrix, adding a small number to several adjacent locations. Then I wish to be able to undo that operation. Moreover, I wish to be able to make several such updates in some order, and then undo them in a different order. Furthermore, the intermediate sums may be larger than can fit in a single digit.
What I'm actually implementing is an array which corresponds to a grid overlaid on a battlefield in a 3D graphical simulation. The values of the array correspond to the brightness of the light at any given location. The initial array represents the lighting due to sunlight (which never changes). Updated lighting values represent "dynamic lights"--temporary, local light sources which cast light on nearby terrain, e.g. due to explosions or other objects which cast light. Lighting is additive, so if two lights cast light in the same area, the light needs to sum together. The need for out-of-order-undo is to allow the local lights to turn off independently of each other.
Does any of this have any relevence to the solution to the problem? Not really (see next paragraph)--any solution to the abstract problem will solve the concrete problem. It is possible I could come up with some different thing to do to solve the concrete problem--something that wouldn't solve the abstract problem--but it would be a hack, and probably wouldn't be a very ideal solution, since it would have to by definition produce a different result than the abstract version would--but that would no longer accurately model the actual effect I'm attempting to simulate.
The one way in which the specific application is relevent is in evaluating trade-offs. I might come up with two or three possible solutions; some knowledge about the domain in which I will apply the solution might reveal that one of the trade-offs is better than the others, in that light.
A quick-and-dirty solution is to simply add in the new values, and when "undo"ing, subtract those same values back out. Because addition is both commutative and associative, you can prove that mathematically this has the correct effect. However, computers have limited precision, and at some point the number will either overflow. When this happens, I must either let the number wraparound (like a one-digit car odometer) or saturate (limit it so it can't get bigger then the maximum allowed value). If I let it saturate, I get a visually acceptable result initially--but the undo doesn't work correctly (saturating arithmetic isn't associative). If I let it wraparound, I get visually unaccepable results when the precision is exceeded, but it does return to the original value.
Computers are finite. At some point, you have to give in and say "I have enough precision". If I gave every element of the matrix two digits of precision, but displayed the values as if they had saturated upon exceeding the first digit, I might get an acceptable compromise--two digits is enough to guarantee it never wraps around, due to other constraints of the situation (e.g. I never add a number larger than 9, and I never have more than 10 updates active at once).
However, it is a constraint of this particular problem that I could not afford to significantly increase the precision of matrix elements. There were a very large number of elements, but a much smaller number of update operations active. As such, it was perfectly reasonable to store extra data associated with each update operation, but not for the entire matrix.
When I was given this problem, it had an already-existing system which had used a cheap hack to avoid some of the problem. The existing system used saturating addition, but avoided problems as long as two updates didn't overlap each other. However, it behaved incorrectly if two updates did overlap each other, exhibiting a saturating addition flaw. However, it was designed so that once all updates were undone, the final correct result was obtained--the saturating math bug was only visible when at least one update was still active.
For this reason, it had been an acceptable hack for a while, but at some point a situation arose where the effect was extremely visible--e.g. because one particular update stayed active permanently.
I will now sketch out the design of the existing hack.
Normally, after a saturating addition, a subtraction may go too far. If the starting value is 5, and the temporary value is 7, then 5+7 = 12, and 12-7 = 5. But if we saturate at a single digit, 5+7 = 9, and 9-7 = 2--so we will end up at 2 with this naive algorithm.
The "obvious" solution to this is to say, "well, but you didn't really add in 7--you only added in 2. So you should only subtract 2".
The reason this isn't done automatically is because the amount that is added in is computed in an abstract way. (In this particular implementation, it was sampled from a small table-- essentially this was bit blit of a bitmap that used addition as its operation.) Ideally, to undo it, we just subtract out the same thing.
The hack, then, was to keep track of how much was "really" added in. Each update stores a little local grid of how much it really added to each square, accounting for saturation, and then removing the update requires subtracting based on the little local grid, instead of the original "raw" formula for how much got added. Thus, each active update element has to provide storage for a little rectangle of values, which are the values that it added in.
The temporary hack works--it guarantees that when all updates are undone, all the original values are restored.
However, it has the problem that the operations are not strictly associative and commutative. With real math, 5+3+7-3-7 = 5. With saturating math, that blows up horribly (5+3+1-3-7). The new approach accounts for the saturation: 5+3+1-3-1 = 5.
However, the answer is incorrect at the point in time 5+3+7-3. The correct answer at this point is 12, saturated to 9. The value computed by the hacked approach is 6. This is because at the time the 7 is added in, the square has already been updated. The 7 saturates based on the updated value (5+3), not the original value 5.
The visual bug that was reported was this: light a temporary light of some sort. Create an overlapping, longer term light. Extinguish the temporary light. The places where the temporary light was look "too dark".
This is the point when the problem was put on my plate. It was my task to fix the bug.
Three plausible solutions suggest themselves quite rapidly.
I chose to pursue the third solution. The main reason for this is because it's actually somewhat simpler, due to modularity. The third approach involves enhancing the way the matrix encodes and represents values within the matrix, but does not require modifying the way updates are encoded, represented, or reported. The first technique requires "iterating through all updates", and the second technique requires knowledge of the rectangles associate with each update. Thus, the third solution is also more general (it could handle non-rectangular updates). In general, it removes the need for the hack--the need for the updates to store "corrected" values. Finally, it is potentially more efficient, because no extra processing need be done if the marker value isn't seen, whereas the other techniques must explicitly check for overlapping values. (On the other hand, in practice, there must be a check for the marker value, which may be less efficient.)
The largest possible value (in this example, 9) is chosen to be the "marker" value which indicates that the actual value needs more precision than is available. This dovetails cleanly with the desire to "saturate" the reported values at the maximum; other code which is not concerned with updating lighting values, merely with using their values, can directly use the values in the matrix without any further processing. That is, at all times the matrix always contains the correctly saturated values.
If an entry has the marker value, a separate data structure (in this case, a hash table) is used to look up values which have overflowed. When an update occurs, values from the matrix are retrieved. If the value is the marker value, then the "real" value is retrieved from the overflow table, updated, and stored back in the overflow table. If it is an undo update, and the value drops below the marker value, then it is written back to the matrix, and the overflow table entry is deleted. If the value in the matrix is less than the marker value, then it is updated directly. If this result is greater than or equal to the marker value, a new overflow table entry is created and updated directly.
Two trivial refinements suggest themselves instantly. They were not implemented because their benefit is minimal: first of all, if the update operation is an addition, then the test to see if there is already a marker present can be combined with the test to see if the newly computed value overflows, avoiding a test in the common case that overflow does not occur. (This generally isn't worth the extra code since the same thing can not be done on an undo operation, so the performance savings is minimal.) Another tweak is to avoid creating an overflow table entry if the marker value occurs naturally. This case can be distinguished from a true marker value (indicating extended precision) by the absence of an entry in the overflow table. Again, the extra code isn't worth the effort, since this is likely to yield a small space savings without a significant performance savings. If the matrix naturally (that is, initially) contained a significant number of natural marker values, this would be a very natural thing to do.
Actually, none of this is patentable in a trivial sense--there's probably prior art from 20 years ago documenting this solution to some variant of the abstract problem. I'm not sure what it would be for, but it seems to me like it's the sort of thing that would have arisen, especially in those days when memory was even tighter than it is now. But give me the benefit of the doubt--accept that this was an original (re-)invention of mine at the time I wrote it. Should it be a priori patentable (that is, independent of the question of prior art)?
This is the point where pro-patent people are going to argue that I've set up a straw-man--that nobody gets granted patents for stuff like the above, even if it is novel (that is, even if this person is the first person to come up with the technique). If they really think that--if they really think this isn't the sort of thing being patented--then there's no point in going on here. In my opinion, this is exactly the sort of problem that has come to light frequently in the software programming community. It may be that the patent is expressed in a tight context, rather than the abstract process, e.g: "storing and modifying an array of lighting values with enhanced precision". For example, the backing store patent was qualified to only apply to workstations which could run multiple processes at the same time, even though the identical algorithm was applicable to a single process running multiple windows--or even to a video game.
I happened to time how long it took me to "fix this bug" so I could write it up for my company newsletter in 1994. Thus, the measured time included: reaching a full comprehension of what the existing hack did and why it didn't work; brainstorming a solution; refining the details of the chosen solution; and implementing the solution, including programming, debugging, and testing.
The entire above process took me about one hour.
Ideas are nearly free. The fact that an idea must actually be executed completely and documented to qualify as prior art means that those ideas that I didn't use--somebody else could still patent them, even after reading this document.
While I, as far as I know, independently invented the chosen solution described above, I suspect such "overflow tables" have been used in other places. (The term "overflow table" itself I am borrowing from a different kind of overflow table, found in cases where a data structure runs out of built-in storage for elements and puts them into another data structure--a somewhat similar idea, but involving extra elements, not increased precision.)
Let's suppose I did know of prior uses of this abstract algorithm. If those prior uses were only described in the context of solving a specific problem, does that prevent them from being prior art for this particular solution? It shouldn't prevent them. It's inherent in the process of programming that programs can always be abstracted to solve the abstract problem, and then that problem can be concretized for particular applications. People shouldn't be obliged to formally, in public, describe the algorithm in the abstract terms to prevent such application-to-a-new-domain patents.
Mind-numbingly stupid example of specificity: the algorithm above could be described as a method for maintaining lighting values of points stored on a grid. Interestingly, the exact same project also stored a "height" value, which indicated (roughly) the altitude above sea level of that location on the grid--this was used to draw varying, rough ground, rather than a big flat grid. One thing we considered implementing was "real-time terrain deformation", in which a shock wave would travel visually across the screen, causing the ground to ripple, or an explosion would crater out a hole. Some of these effects needed to be undoable. It turns out that a natural solution to such effects would be to update the height grid in the exact same way the light grid was updated. If the height values happened to be near the maximum or minimum precision allowed, identical code to correctly handle running out of precision would need to be written. Trivial. Shouldn't get patented. But this is what happened with backing store.
This algorithm works if there is only a single item, rather than a matrix, but it is somewhat degenerate. It also would work for updating a one-dimensional array, a two-dimensional array (as described), a three-dimensional array, etc. Each of these variants is simple and trivial to create, given the original. Moreover, conceiving the idea of using the original in these other ways is obvious and trivial.
You can see in my description of why I chose the mechanism over the other two was because it was more modular. It wasn't necessarily more efficient or less code. But it did mean that the implementation chosen would work no matter how the updating system worked, rather than being tightly coupled with it. This is generally good programming design, which programmers do all the time, because in the long run it is more efficient--time is saved because compontents can be easily reused with other components.
A constraint on the solution is that it produce exactly a certain well-specified result in the matrix. All three of the proposed solutions do this (but the original hacked solution did not--hence the bug report). There are cases of patents being filed in which what is patented is effectively "any algorithm which manages to guarantee it leaves these results in the matrix". (In this case the algorithms work quite differently and have impose different constraints on surrounding software systems, but it is quite different for the exclusive-or cursor algorithm.)
Because of the constraint, it's not obvious that there are many other possible solutions to the problem. (There is probably a more minimal-update solution than the first two-proposed, but it would be a lot of work for minimal gain.)
Thus, anybody who tackles this problem is likely to come up with one of those three solutions. Why should any of them be patentable? If one of them were particularly natural and easy to do, or more flexible or general, why should that one be patentable?
It is widely believed by the computer programming community that most (if not all) software patents that have been granted are "trivially obvious", most likely in one of the sense above. Most disagreement seems to come from those with a vested interest in how patents put money in their own pockets, as opposed to improving the common good. (One could of course argue that I am anti-patents because they take money from my own pocket by making me pay licensing fees. If the patent system worked, I would be getting something of value in return for that--useful published information. There is no evidence that this is the case.)
Unfortunately, without explicit prior art, a practitioner in the field has no way of guaranteeing that they can defend themselves from a lawsuit, especially if they cannot afford one. Yet the grounds for trivialness are obvious. The LPF points out that those who had been using backing store prior to the filing of the patent would not have been able to continue using it had the patent been actively litigated. Some reforms are underway that might allows those who have previously invented the patented material to continue to use it. But this misses the fact that the idea was trivially obvious. And if the courts will not accept the existence of "simultaneous art", or prior art in different fields, as evidence of triviality, there is little recourse to dealing with an absurd patent.
Fundamentally, such reforms fail to address the absurdity of granting patents for subject matter that a skilled programmer can duplicate in less time and for less money than she could conduct a patent search--and for far less money then he would spend licensing the patent or in litigation disputing the absurd patent.
The system is broke, and it seems the PTO is financially rewarded for keeping it broken. I guess Congress is the only place to appeal to get this fixed. With companies like IBM making a lot of money licensing patents, and spending some of it lobbying Congress, though, that may be too much to ask for. More information on the financial sides of patents for the PTO and companies like IBM can be found on the League for Programming Freedom's patent site.
Copyright 1998-1999 Sean Barrett - sean at nothings dot org
[ HOME ]