Saturday, April 1, 2017

Hash table Basic Introduction


Hash tables are one of the most commonly discussed data structures in Computer Science. Hash tables, in short are used to identify a particular object from a set of similar objects.

They are more efficient than other data structures used for searching, like trees and other tabular searching structures.

Note that all of the books can be represented easily in each of the shelves. But suppose you wanted to identify each of the books without having to search too much. Suppose you wanted to identify each of the books automatically with just the title, i.e., identify the book at O(1) efficiency.

We can evenly divide the books into two or three groups as shown here, but this does not improve the efficiency of the search. The way to do this is to run a computer program that will automatically give you an identity for each book. This identity is the same identity for the book every time you look up the title. For convenience, let us suppose the rows of the bookshelf are lined with numbers and the columns are lined with alphabets, like in Excel.



This computer program gives you the shelf number (row number) and the slot number (column number). This is called a hash algorithm. The key, in this example is the title of the book. So essentially, you pass the title through the hash function and it spits out some value, which is where you would place the book. Suppose the hash function is the sum of the alpha values of the first word of the title that isn't 'a' or 'the' or 'an', using the scale a=1, b=2, c=3..., z=26.

We can notice some obvious problems here, but first, let's make sure we have our understanding of the visual hash table function and book example correct (Note: The diagram below is supposed to be linear, and the lines are connecting 8 to 9, 16 to 17, 24 to 25, 32 to 33, and 40 to 41).





This is what the hash table looks like properly
From here, we can see that the hash table is linear, from 1 to 48. Suppose for a second we only had 48 titles, and each title summed up to a different number, all within the range 1-48. This is almost an impossible concept, but bear with me. If that were the case, then each book would go neatly into each of the slots with no repetition.
But alas, 48 titles are bound to have some sums that are very repetitive and some titles that aren't used at all. So what would we do with titles that sum up to the same number? This is called a collision. Speaking in terms of a bookshelf, we could place the title next to the title with the same sum. Then, in some cases, there would be more than 1 book per slot, and in some cases  0 books per slot.
Now suppose we title sums that exceed the number 48. Suppose we have a title sum that is 126. In this case, we would calculate the modulo, to see where the number lands when it wraps around the hash table :



This should give you an intuitive idea of what a hash table does.
Next will be an intuitive understanding of the design of a hash table and an implementation of a hash table in C.

Wednesday, March 29, 2017

Binary Bomb: Phase 5



Phase 5:

We know the drill by now!
Let's take a look at the assembly.
08048dda <phase_5>:
 8048dda:   83 ec 2c                sub    $0x2c,%esp
 8048ddd:   8d 44 24 1c             lea    0x1c(%esp),%eax
 8048de1:   89 44 24 0c             mov    %eax,0xc(%esp)
 8048de5:   8d 44 24 18             lea    0x18(%esp),%eax
 8048de9:   89 44 24 08             mov    %eax,0x8(%esp)
 8048ded:   c7 44 24 04 a9 a7 04    movl   $0x804a7a9,0x4(%esp)
 8048df4:   08
 8048df5:   8b 44 24 30             mov    0x30(%esp),%eax
 8048df9:   89 04 24                mov    %eax,(%esp)
 8048dfc:   e8 ff fa ff ff          call   8048900 <__isoc99_sscanf@plt>
 8048e01:   83 f8 01                cmp    $0x1,%eax
 8048e04:   7f 05                   jg     8048e0b <phase_5+0x31>
 8048e06:   e8 0a 07 00 00          call   8049515 <explode_bomb>
 8048e0b:   83 7c 24 18 07          cmpl   $0x7,0x18(%esp)
 8048e10:   77 66                   ja     8048e78 <phase_5+0x9e>
 8048e12:   8b 44 24 18             mov    0x18(%esp),%eax
 8048e16:   ff 24 85 98 a5 04 08    jmp    *0x804a598(,%eax,4)
 8048e1d:   b8 00 00 00 00          mov    $0x0,%eax
 8048e22:   eb 05                   jmp    8048e29 <phase_5+0x4f>
 8048e24:   b8 e1 03 00 00          mov    $0x3e1,%eax
 8048e29:   2d 0b 01 00 00          sub    $0x10b,%eax
 8048e2e:   eb 05                   jmp    8048e35 <phase_5+0x5b>
 8048e30:   b8 00 00 00 00          mov    $0x0,%eax
 8048e35:   05 b8 00 00 00          add    $0xb8,%eax
 8048e3a:   eb 05                   jmp    8048e41 <phase_5+0x67>
 8048e3c:   b8 00 00 00 00          mov    $0x0,%eax
 8048e41:   2d 25 01 00 00          sub    $0x125,%eax
 8048e46:   eb 05                   jmp    8048e4d <phase_5+0x73>
 8048e48:   b8 00 00 00 00          mov    $0x0,%eax
 8048e4d:   05 25 01 00 00          add    $0x125,%eax
 8048e52:   eb 05                   jmp    8048e59 <phase_5+0x7f>
 8048e54:   b8 00 00 00 00          mov    $0x0,%eax
 8048e59:   2d 25 01 00 00          sub    $0x125,%eax
 8048e5e:   eb 05                   jmp    8048e65 <phase_5+0x8b>
 8048e60:   b8 00 00 00 00          mov    $0x0,%eax
 8048e65:   05 25 01 00 00          add    $0x125,%eax
 8048e6a:   eb 05                   jmp    8048e71 <phase_5+0x97>
 8048e6c:   b8 00 00 00 00          mov    $0x0,%eax
 8048e71:   2d 25 01 00 00          sub    $0x125,%eax
 8048e76:   eb 0a                   jmp    8048e82 <phase_5+0xa8>
 8048e78:   e8 98 06 00 00          call   8049515 <explode_bomb>
 8048e7d:   b8 00 00 00 00          mov    $0x0,%eax
 8048e82:   83 7c 24 18 05          cmpl   $0x5,0x18(%esp)
 8048e87:   7f 06                   jg     8048e8f <phase_5+0xb5>
 8048e89:   3b 44 24 1c             cmp    0x1c(%esp),%eax
 8048e8d:   74 05                   je     8048e94 <phase_5+0xba>
 8048e8f:   e8 81 06 00 00          call   8049515 <explode_bomb>
 8048e94:   83 c4 2c                add    $0x2c,%esp
 8048e97:   c3                      ret  

Off the top of my head, none of this really makes sense to me. There's a lot of calls to <phase_5+0xba> or some other call. There are many comparison and jump calls. I do however, see a call to scanf.
I will head to gdb to test my "test string" and see what happens there
   0x08048dda <+0>:     sub    $0x2c,%esp
   0x08048ddd <+3>:     lea    0x1c(%esp),%eax
   0x08048de1 <+7>:     mov    %eax,0xc(%esp)
   0x08048de5 <+11>:    lea    0x18(%esp),%eax
   0x08048de9 <+15>:    mov    %eax,0x8(%esp)
   0x08048ded <+19>:    movl   $0x804a7a9,0x4(%esp)
   0x08048df5 <+27>:    mov    0x30(%esp),%eax
   0x08048df9 <+31>:    mov    %eax,(%esp)
   0x08048dfc <+34>:    call   0x8048900 <__isoc99_sscanf@plt>
   0x08048e01 <+39>:    cmp    $0x1,%eax
   0x08048e04 <+42>:    jg     0x8048e0b <phase_5+49>
   0x08048e06 <+44>:    call   0x8049515 <explode_bomb>
   0x08048e0b <+49>:    cmpl   $0x7,0x18(%esp)
   0x08048e10 <+54>:    ja     0x8048e78 <phase_5+158>
   0x08048e12 <+56>:    mov    0x18(%esp),%eax
   0x08048e16 <+60>:    jmp    *0x804a598(,%eax,4)
   0x08048e1d <+67>:    mov    $0x0,%eax
   0x08048e22 <+72>:    jmp    0x8048e29 <phase_5+79>
   0x08048e24 <+74>:    mov    $0x3e1,%eax
   0x08048e29 <+79>:    sub    $0x10b,%eax
   0x08048e2e <+84>:    jmp    0x8048e35 <phase_5+91>
   0x08048e30 <+86>:    mov    $0x0,%eax
   0x08048e35 <+91>:    add    $0xb8,%eax
   0x08048e3a <+96>:    jmp    0x8048e41 <phase_5+103>
   0x08048e3c <+98>:    mov    $0x0,%eax
   0x08048e41 <+103>:   sub    $0x125,%eax
   0x08048e46 <+108>:   jmp    0x8048e4d <phase_5+115>
   0x08048e48 <+110>:   mov    $0x0,%eax
   0x08048e4d <+115>:   add    $0x125,%eax
   0x08048e52 <+120>:   jmp    0x8048e59 <phase_5+127>
   0x08048e54 <+122>:   mov    $0x0,%eax
   0x08048e59 <+127>:   sub    $0x125,%eax
   0x08048e5e <+132>:   jmp    0x8048e65 <phase_5+139>
   0x08048e60 <+134>:   mov    $0x0,%eax
   0x08048e65 <+139>:   add    $0x125,%eax
   0x08048e6a <+144>:   jmp    0x8048e71 <phase_5+151>
   0x08048e6c <+146>:   mov    $0x0,%eax
   0x08048e71 <+151>:   sub    $0x125,%eax
   0x08048e76 <+156>:   jmp    0x8048e82 <phase_5+168>
   0x08048e78 <+158>:   call   0x8049515 <explode_bomb>
   0x08048e7d <+163>:   mov    $0x0,%eax
   0x08048e82 <+168>:   cmpl   $0x5,0x18(%esp)
   0x08048e87 <+173>:   jg     0x8048e8f <phase_5+181>
   0x08048e89 <+175>:   cmp    0x1c(%esp),%eax
   0x08048e8d <+179>:   je     0x8048e94 <phase_5+186>
   0x08048e8f <+181>:   call   0x8049515 <explode_bomb>
   0x08048e94 <+186>:   add    $0x2c,%esp
   0x08048e97 <+189>:   ret   

Note the following:
(gdb) x/s $eax
0x804d960 <input_strings+320>:  "test string"
(gdb) x/s 0x804a7a9
0x804a7a9:  "%d %d"
(gdb)
This gives us a good idea of the format required for this phase: Two numbers:
I may have gotten  a bit lucky with this one. I randomly guessed two because it was greater than 1, as can see by the
0x08048e01 <+39>:   cmp    $0x1,%eax
statement. But the other guess took some work. I again, worked my way through a random number in gdb, until i kept seeing eax with 0xffffc3 or some address like it and the integer -109 next to it.
Then I re-examined the disas code. There was:
0x08048e30 <+86>:   mov    $0x0,%eax
0x08048e35 <+91>:   add    $0xb8,%eax
This was also paired with many add $0x125 and sub $0x125, but ultimately each canceled out till all was left with sub $0x125. Subtraction of 0xb8-0x125 gives the integer -109, which works with this phase.
So the answer: 2 -109