Wednesday, February 5, 2014

NFA to DFA Converter (Ruby)

Nondeterministic finite automata (NFAs) all have a deterministic counterpart, called a deterministic finite automata (DFA). Finding the DFA for an NFA is trivial for simple problems, however the problem quickly becomes difficult once more states and possible symbols are added. This program automates this process, not only for the purpose of checking homework problems, but also so that different methods of constructing DFAs can be tested (a future post will detail these findings).

I will talk briefly about how I've accomplished this, but first a little about NFAs and DFAs.

NFAs are finite state machines used to determine if a given input is accepted or rejected. They are defined formally as 

M = (Q, sigma, delta, q0, {F})

Where 
  • Q = states within the NFA (nodes)
  • sigma = machine alphabet (possible symbols)
  • delta = transitions between states (edges)
  • q0 = initial state (starting point)
  • {F} = set of possible ending states (end points)
Note: DFA's follow basically the same definition, except ...

Here's an example of a possible NFA: M = ({q0, q1}, {0,1}, delta, q1, {q0}) where delta is defined as: 

-----------------------------------
delta*(q0,0) = q0
delta*(q0, lambda) = q1
delta*(q0, 1) = q1
delta*(q1, 0) = q0
-----------------------------------

The resulting NFA would look like:


This NFA would accept strings like 01, 011, 000 and so on.

The goal of the program is to turn this NFA into its corresponding DFA. The program does this by following what is called a "proof by construction," which shows that for every NFA there exists a DFA. The proof is (roughly) as follows:
  1. Create a graph with the appropriate initial vertex.
  2. Repeat the following steps until the number of edges leaving each vertex equals the carnality of the alphabet:
    • Take any vertex that is missing an edge for a in sigma.
    • If delta*(qi, a) ... U ... delta*(qk, a) is an element of the power-set of Q, add an edge from qi to qk labeled a.
That is basically what this program does. The NFA would be fed in as a CSV like so:

q0,q0,0
q0,q1,l
q0,q1,1
q1,q0,0

(with a few proceeding lines for initial state, etc.) and produces the following DFA:



Unfortunately this graph doesn't indicate which are initial/final states, but that will come soon (hopefully).

Here's a really crazy example of a DFA it churned out!




The source for this project is available under MIT license here: https://bitbucket.org/saulcosta18/nfa_to_dfa
DISCLAIMER: this code is still very rough!

Feel free to comment with questions/comments.

Cheers!






Wednesday, December 25, 2013

Question: Can you explain what Hadoop is and what some real world applications are?


Answer:

The only thing I've done that relates to Hadoop is the MapReduce algorithm. MapReduce is a way to take a large dataset and shuffle it down into a smaller one. An example of this might be the following:

Say you have a database full of records. Each of these records correspond to an instance of a user (maybe on Spotify) listening to an artist's song. So if I listen to Eminem's "Rap God" at 4PM, 5PM, and 10PM it will have three different records in the database. It might look something like this:

user_id  song_id  time_played
-------------------------
1           2            16:00:00
1           2            17:00:00
1           2            22:00:00

Having data in this format isn't really that useful if you have millions or billions of records in the database and want to see, for example, how many times "Rap God" has been played. We could do a database query and ask for a COUNT of the number of records that there are for the song, but if there are a lot of records this could take a long time (even in an indexed database, which won't always be the case).

So because of this we use algorithms like MapReduce to reduce the data into a more manageable format. With MapReduce, there are three phases: Map, Sort, and Reduce. The basic principle is that you print out each record that matches our criteria (a play for Rap God) and a "1" after it. So the records above would be Mapped and displayed as:

2, 1
2, 1
2, 1

where the 2 is the song's ID and the 1 is the 1 used in the Reduce step.

This output from Map step is then sorted (in this case, by song_id). This is the most time consuming and important step of MapReduce. The Reduce step doesn't keep a tally of all the songs it finds as it goes through the mapped data... it only cares about the one it is working on at that point. I'll illustrate this in a moment.

So all Reduce does is sum the number of lines for each record, in our case the Map output would be reduced to something like:

2, 3

(note that there were 3 rows). Now here's an example with some unsorted output:

2, 1
2, 1
3, 1
2, 1

The Reduce step for this output would produce:

2, 2
3, 1
2, 1

Because the input to Reduce was not sorted.

To answer your overall question in regards to applications of MapReduce: it allows raw data (such as song listens) to be turned into valuable information. Some of the questions we might answer using it include:

a) How many times was Rap God listened to?
b) How many songs has Saul listened to?
c) How many times has Rap God been listened to in the last day?

Once we have the overarching MapReduce system built we can easily change the criteria for our Map step to look for (different dates, etc.) to answer the different questions.

Another great thing that makes MapReduce so powerful is the fact that it allows for what is called "parallel computing." Picture an architecture with 1 master computers and 25 slave computers. Each piece of the data we want to process is broken up by the master and given to the slave to process. This allows us to utilize more resources simultaneously then would be possible with most computers available.

In theory this means we will always be able to scale up the number of slave nodes to handle more data. In practice this isn't always true for extremely large datasets without specialized system configurations (by large datasets I mean multiple terabytes). 

Hadoop has other components for data processing. I haven't worked with them extensively, but my understanding is that some are hybrid approaches between database queries and the technique I outlined above.

Tuesday, April 3, 2012

Summer Research Grant

So back in a couple of months ago I wrote a grant to Norwich University for a summer research project. It was for an Intrusion Prevention System (IPS) that will be implemented on Brain-Computer Interfaces (BCI). Basically it is so that as humans interface more directly with computers using their brains, this IPS will prevent malicious programs (or people) from compromising their intellectual integrity (eg. monitoring their thoughts, controlling their actions, even killing them).  

I was really concerned that I wouldn't get the grant, because it's very technical and kind of futuristic, and the people on the board are not at all technical and usually only approve projects that have applications at this point in time. I worked very hard (and got a lot of great help from Dad) to write it in a way that was easy to understand and showed that it was something that could be used in the near future. 

And I guess it worked, because I just found out today that I got the grant! So now I'll be staying here at NU this summer for ten weeks after this semester finishes. The pay for the ten weeks is only $4,000, and as this isn't as much as I could make elsewhere I am working on a second grant to DARPA that will provide me some additional funding, as well as funding to purchase some equipment that the project will require. The professor advising me on this project and myself are fairly certain I'll get the DARPA grant, as they are technical people and this is something they would be very interested in. As part of this research project I'll be writing an in-depth research paper that I hope to have accepted to a conference in this field, which would be a great opportunity. 

So yeah! I'm very happy. I worked really hard on the grant, and I'm glad it paid off. I really wanted to spend my summer doing something relating to my field, and I'm glad God has blessed me with this opportunity. 

Hope all is well,
Saul