BlogLogBlog

The Vigor of Big-O

March 26, 2020

Hero

TL;DR

Big-O Notation is a symbolism we use to denote time complexity for an algorithm in relation to input size. Although used generally for notating CPU (time) usage, developers should also practice analyzing how much memory, disk, or network their algorithms consume. In the real world, only a fraction of your decisions matter on Big-O analysis, but expanding how we utilize this system will enable us to improve sprints.


Preface

There are plenty of resources on Big-O Notation on the web, but as a beginner, learning about the place and reason of such mathematical concepts in programming was a little difficult to wrap my head around. We’ll go over the conceptual idea of Big-O, understand the various notations, when to use it, why you should care about it, and also introduce a new idea.

Tweet1


Semantics

What is the ‘O’ in ‘Big-O’? Oranges? Orthodontist? Oligarch? No one knows. Actually we do, but don’t ask Google.

Dictionary

The capital letter O stands for order of (in German: ”Ordnung von”) and was first used by the German number theorist Paul Bachman in the second issue of his book on analytic number theory in 1894. The notation later gained popularity due to the work of Edmund Landau, another German number theorist, with whom this nomenclature is widely associated with today. See Asymptotic Notation.


Pursuit of Efficiency

As I would be simply regurgitating what is easily available on the internet, I didn’t bother writing my own explanation of the possible complexities.

The graphic below is from Developer Roadmaps and easily explains the various notations we use when describing the changes in time complexity when input size changes.

Kamran

Big-O Notation can be used for a variety of measurements when we want understand the cost of running a piece of code. They are CPU (time) usage, memory usage, disk usage, and network usage. Most in Computer Science simply care about CPU usage, which is time.

There are two distinct families of thought here - performance and complexity.

Performance is the measurement of how much disk, memory, and data actually used when the program is run. This factor depends on the hardware, compiler, as well as the code.

Complexity is how does the resource requirements of an algorithm or program scale as the size of the problem being solved gets larger. Be aware, complexity affects performance but not the other way around.


The tOmOdOnO Notation

Consider a chunk of code that lets your app users search the database but locally saves every search query and result in the cache.

If asked to analyze the complexity of this code, most will only consider how much time it takes and hypothetically assign it a O(log n).

Depending on how the data is stored in cache, or how many times we make API calls to the server, it might not be that efficient for the client in other aspects. Instead of saving each query in it’s own file, would it be better to save them all into one file? If they are all stored in a single file, should they be in table, or as JSON objects? Most applications aren’t used just once, but used uncountable times; and in the long run, how you decide to store or transfer data between your client and servers could end up being way costlier than it needs to.

Instead of simple labeling your algorithm according to CPU usage, I propose a fresh system to prefix Big-O notations.

CPU(time) Usage - tO(n), where n is the input size

How does the amount of time taken to solve a problem scale when the input size increases? You know the basic questions to ask yourself regarding CPU usage.

Memory Usage - mO(n), where n is the peak RAM used

How does the amount of RAM used increase when the problem to solve gets bigger? Does my algorithm use a lot more memory than it needs to?

Disk Usage - dO(n), where n is the storage space consumed

How can we minimize the amount of storage used in the long run? Is there a way to re-use the same disk location instead of creating new files or data. Are there data that is recurring in the files we make, and could we make it just once?

Network Usage - pO(n), where n is the packet data transferred

Is my algorithm/program making too many API calls or transferring redundant data through the servers? How can we make it cheaper for our servers or for the clients?

So far, my team at Makeshack have found this line of thought very beneficial when discussing applications as we have a more well-rounded approach to optimization than we did before.

Additionally, there could be other factors we can analyze using Big-O.

Cost of Usage - $O(n), where n is the cost of services

This is connected to network usage. Using services like Google Cloud, AWS, or Firebase can cost a lot of money, and understanding how expenditure scales because of our code could provide fresh insight.

Human Cost-hO(n), where n is the number of meetings

This factor evaluates how teams communicate to avoid unnecessary communication threads. In any team, the number of possible interactions between each other increases exponentially by every extra member. A team of 5 can have 25 different discussion threads, whereas a team of 100 could have 10000 threads. Of course, these are worst cases, but because humans don’t talk about just 1 topic, this number could be potentially be much higher. The question now, is how can we optimize these discussions so that noise can be removed and enable efficient discussion. Very hard problems to fix but something to consider.


The Holistic Score

If the four primary factors (Time, Memory, Disk, Network) are considered and evaluated during production, it would be very beneficial for any team to build discussion and bring forward unique solutions. Sure, there are kinks and loopholes to figure out but I hope to flesh it out with the advice of my peers and mentors over time. Some of the complexities that we can assign when talking about CPU usage will probably not be applicable to other resources, but this is a good starting point to initialize discussion between programmers.

Once we have evaluated our code using these four factors, we take the worst of them and assign that as the Big-O Notation for our algorithm/program . This approach will let us look at efficiency of our code from fresh perspectives and identity a more holistic score for code.

tO(log n) + mO(1) + dO(2^n) + pO(n!) => O(n!)


Would love to get feedback and improvement suggestions. Tweet me at SimonLikelySaid or ping me on Keybase.


Relevant xkcd:

Travelling Salesman Problem #399

399 P.S. If you don’t get this joke


© 2020, BlogLogBlog