What I Learned From Studying Algorithms and Data Structures (AlgoDS 2)

5 min readJun 13, 2017

Continuing from last post

Now let’s walk through the standard curriculum. Again I’m not a CS major but I’ve taken the Algorithms I course on Coursera (as well as read most of the textbook) and I have also read Cracking the Coding Interview. I did try Introduction to Algorithms but it’s a bit too deep for me. The mathematical Big O analyses and statistical calculations are still over my head and I have no clear reason why I would ever need to calculate them.

Why I Think We Start with Sorting

It’s kind of an introduction. You learn that there are many different ways to do things:

Some better than others
Some that are non-obvious (merge sort, quick sort)
That in different situations, using a different implementation actually makes sense

It shows that there are different approaches to solving a problem and sometimes you actually do need to pick the correct one.

I remember my first technical interview. I didn’t even know it was one. The interviewer asked me, write me a function to sort an array.

I said “Easy!” and then wrote something like:

List list = new List(arrayOfIntegers);
list.Sort();
return list.ToArray();

He looked at me funny and then said what is Sort()?

I didn’t know it at the time but basically brute forced Selection Sort. Yea, I didn’t get the job.

Big O: The Concept

Sorting also introduces time complexity. There are textbook ways to demonstrate the impact, compare the run times of the different sorting Algorithms but I never really got the point as all I needed to know is list.Sort()

I never understood it until, combined with data structures, I actually used it although I didn’t know it back then.

The Problem

I needed to parse web server logs yea again though this was actually the first time I wrote a Parser.

As part of the logic, it needs to normalize each URL. It does this by matching the actual URL with a list of regex patterns.

With a bit of trial and error (regex was new to me back then too), I built it and it worked… but it was very slow.

It was parsing 2,000,000 requests which took around 2 hours…

The Brute-Force Solution

The URLs can actually be grouped and for each request I can quickly find out which group it belongs to. What if I used a Dictionary of Lists?

JACKPOT! The script now takes 30 minutes. Good enough for me.

Data Structures and Big O Way

What if I used something like a tree. It only needs one level and the the nodes can be lists.

Let’s pretend I don’t know about BST and that search only takes log N.

Let’s see, if I used a Dictionary of size D, lookup time will be O(1).

If the URLs are evenly distributed, each node will be a list with size N/D. Lookup time will be O(N/D)

So assuming the coefficient for grouping is small is small

O(1) + O(N/D) is better than O(N) by a factor of D which translated to a lot.

Again a real tree would’ve been better but I didn’t know about it back then. I sort of reinvented an interior wheel.

Basically if you know some Big O, you can look at the problem differently and you can ask better questions:

Is there something I could do to make this logic better, faster than O(N)?
Which data structures and algorithms might be able to do it?
Should I do anything with the data (ordering, randomize) to make it work better?

In summary, I see Big O as more a language, to:

Pick the best tools to use
Analyze a particular problem
Get a gist of how much time it takes
Identify if it’s possible to do better

Graphs: DFS, BFS, Better Searching

So I’m actually not too interested in graphs themselves. How to create a MST, implement a BST. But I think it serves 2 purposes:

Adds to your tool belt algorithms that efficiently solve hard problems, so you don’t end up implementing inferior wheels
A different way of thinking about problems; solutions can actually be built on top of each other

Loops Invariants, Recursion: Solving the Problem, Not the Language

So I’ve been using OOP languages all the time and there’s one Golden Rule you learn very quickly:

If you call a function from itself too many times, you WILL get a StackOverflow Exception. Basically, just don’t do it. ALWAYS use a loop.

So when I see a recursive problem, I usually think:

How do I solve this with a for/while loop?
What temporary variables outside the loop do I need to keep track state?
What conditions should cause early termination?

It’s not too bad but the first thing I’m doing is translating the problem to a specific language.

Compared to actually solving the problem:

Let’s start with some simple examples, what are the simplest cases and what do you expected/want to happen?
Now a more complicated one. Hm they seem to build on top of each other.
Are there/should there be any conditions that will always be true?

If you write a recursive function, the structure is very similar to the actual problem. I’ve actually found for simple search functions I need to write, I think and write the recursive version.

The whole thought process is just much clearer though the code may be more confusing, if you don’t comment it, listing out the rationale and assumptions, the loop invariants.

So now my approach is:

Solve the problem on paper
Implement using recursion, make improvements as needed
Optimize for the language, if needed

Another Example of a Inferior Wheel

Once I wrote a simple scheduling app. Microsoft Project and similar programs were overkill. Basically I just needed it to generate release runbook with timing of each step given they’re dependencies and timing.

My natural approach:

Encapsulate each Step as an object that hold the IDs of the ones that it depends on
Set the Start Time of the first Step
This will update their End Time
Find all Steps that directly depends on it
Update the times of each, repeat #4

Assuming there’s no cycles, it takes a long time to run, maybe because of all the searching and the deep nesting due to the dependencies like:

A->B->C->D->B->C->D->…

If I knew all the graph stuff though:

This can be all represented in a graph
There is an existing, fast algorithm that checks for cycles
This can be basically solved by first doing a topological sort on all Steps (vertices)

I think 90% of my problem has already been solved in some way… before I was even born.

This kind also goes back to the growth mindset. You can’t assume you know everything.

Alright that’s all folks… for now. Off to breakfast and then work :)