So apart from the CS50 homeworks that I try to do every 4-5 weeks and a project that I'm really not satisfied at work with VBA, I haven't touched anything for nearly close to 6 months. I worked on the Google Code Jam questions, but purely on paper.
This weekend was ITU Cekirdek's Hackathon, and I've decided to join with a friend to be my field agent. We decided to build a "real-time" app (when I say real time, it still means I have to go SSH to a old machine from 2006 to run scripts because automation too stronk) that would take accelerometer data from my friend's phone, detect his left/right turns and use the time differences between these turns to figure out the link travel times.
If you want to see the code , it's available here. Many thanks to my friend Michael for actually turning this from a Fortran code in Python syntax into an actual Python code, making the code a lot more readable, teaching me SublimeLinter and also how to use git.
I can hear people say "but...gyrometer? compass?". Well, iPhone 3GS doesn't have a gyro, and I didn't even think about the compass, because I'm bad. This sensor app that buddy used has a nice feature. It broadcasts the files in a very basic web server that I can crawl to download them.
So basically, there are several steps of the system
- Get the .csv files to the local computer
- Upload the data in them to a database
- Get the data into a VM in Azure
- Process the data
- Upload the processed data back into the database for storage purposes
- Use the processed data (w/o calling it back from DB) to figure out the timestamps of the turns
- Impose the new travel times into a graph database
The first thing I did was open a trial account on Azure and configure the VM. I have never worked with any cloud systems before, but Azure was a very pleasant experience for me. It probably took 20 minutes to make a copy of the main image I used, and another 15 to set it up and get it running. It has absolutely done nothing in the last 4 days except for running 2 databases and the occasional script.
While the cloud was getting ready, I told buddy to install Ubuntu on his old computer, so I wouldn't need to screw around with Windows and Python, because it's just painful to work with RDP and non-standard Python packages (or at least I thought it was at the time). The computer was able to run 14.04 LTS, but had I known if it was that old, I probably would have gone with much lighter distro. And that's when the first major problem happened. Buddy was behind the college's NAT and there was no way for me to directly SSH into his computer. The first idea was setting up a VPN via Hamachi, but we couldn't figure it out. Although our computers recognized each other, the full link was never completed (Hamachi's term for this is "relayed link"). But wait a second, I'm sure someone at some point had to access a closed network without VPN's?
A quick search on the web said something about a "Reverse SSH Tunnel". Although I still have no idea what it does, it seems like it forms a bridge between the computer behind NAT (let's call that remote_machine) and a computer outside (local_server). Using this bridge, you can use a any third computer to get access to remote_machine. Although there are several instructions online, and I still haven't completely figured out ssh-keys, it was a bit painful. Even after setting up the tunnel, we had constant dropouts, but I think blame them on the old computer.
After getting that working. I started to work on the web scraper, which would work on the old machine. That's a very easy script, just go to a URL on the local network, parse the HTML for <a href> tags and get the URL's of the CSV files. After that, it initially had a function to commit these CSV files to Redis server, but I decided to just post the CSV files into the cloud. The reason for that is, the old machine was terribly slow at committing data to Redis (I'm speaking about 2 keys per second here). This is where all the interaction with the local machine at the site stops.
The part after that was commiting these files to Redis. This is the getcsv.py module. Nothing fancy, see what files are available in the folder where files get downloaded, and upload them to the Redis server, using the predetermined naming convention.
Part 2 to follow up soon...