New (ish) Shogun Website!

I haven’t had time for much machine learning on the side as my new job and learning more about web development has been keeping me pretty busy so I decided to help out the Shogun team by doing some clean-up on the website. The whole thing actually came about because I was trying to find the current status of the shogun build to send to someone on the mailing list and I couldn’t find it on the old site, needless to say it was tough to navigate. My main goal with the re-design was to “flatten” the navigation into a navbar rather then the current system of nested navigation links (as an aside the way this worked was pretty poor, we actually queried a large portion of the database on every page load just to make this nested navigation auto-generated). The project ended up being mostly front-end which isn’t really my thing but it’s cool cause I got to learn some new stuff, I also got to write some Django “migrations” and maintenance tasks so there was a bit of backend too. I think the re-design was a success and the site is definitely easier to navigate around (as a bonus now we don’t have to query the whole db on each page wooo! - solution put the navbar content in the db too!)

you can see the new site live where it’s always been:  http://www.shogun-toolbox.org/

read more

Gmail_ToDo!!

Do you use your inbox as a running ToDo list? Do you send yourself emails so you remember to do certain things? Do you spend most of your day in a terminal? If you answered yes to all 3 of those questions then you might be interested in this nifty little ruby gem  I just released. It’s called gmail_todo and it’s made for quickly emailing yourself a ToDo note from the command line, think of the precious seconds you’ll save! Now when you remember something you need to do and you are in a terminal rather than alt-tabbing (or god forbid reaching for the mouse) you can quickly type todo "get milk" and voila an email will appear in your inbox. As an added bonus my gem prepends [ToDo] to your subject for easy filtering!

Check it out on Github and RubyGems:

https://github.com/kevinhughes27/gmail_todo

http://rubygems.org/gems/gmail_todo

read more

Mailcheck.js in Production

I was recently tasked with adding Mailcheck.js to some of our production pages and I want to describe a bit of the process I went through because I did some things a bit differently and had some fun along the way.

Lets start with a PSA - do not simply drop Mailcheck onto your website as is! In my opinion / findings the default algorithm is way too greedy - aka it will mostly suggest all emails should be ____@gmail.com. It is worth taking the time to tweak mailcheck for your particular userbase, on one wants to see a correction for their proper email address!

The first thing I did was dumped a ton of emails from our database to create a dataset to work with. I could have used Node to write some scripts to test out the Mailcheck behaviour but Python is just so much more convient for doing numerical analysis. Plus it’s what our data team uses so I could leverage some of their knowledge and code. So now for the fun part - I ended up using PyV8 (a python wrapper for calling out to Google’s V8 javascript engine). With this setup I was able to slice and dice through our production emails using python and pandas calling the exact javascript mailcheck algorithm and collecting my results. After tweaking the algorithm I could take the settings and new js code and put it in production.

Check out this wacky franken script that got the job done (pandas not included):

import PyV8

def init_mailcheck():
  global ctxt
  ctxt = PyV8.JSContext()
  ctxt.enter()
  ctxt.eval(open("mailcheck.js").read())


def run_sift3Distance(s1,s2):
  script = "Mailcheck.mailcheck.sift3Distance('%s','%s')" %(s1,s2)
  return ctxt.eval(script)


def run_splitEmail(email):
  script = "Mailcheck.mailcheck.splitEmail('%s')" %(email)
  return ctxt.eval(script)


def run_mailcheck(email):
  script = """ Mailcheck.mailcheck.run({
         email: "%s",
       })
   """ % (email)
  result =  ctxt.eval(script)
  if result:
    try:
      result = result.address + '@' + result.domain
    except(AttributeError):
       pass

  return result

if __name__=="__main__":
  init_mailcheck()
  print run_mailcheck("kevinhughes27@gmil.com")
  # >>> @kevinhughes27@gmail.com

read more