PDX Data Science Affiliated Meetups

News

Jun 23, 2017
How I Learned to Stop Worrying and Love the Code

I’ve loved technology for as long as I can remember. Throughout my life I’ve moved from one fascinated obsession to another. From Legos to the original Nintendo and later the Commodore 64 my mother bought us kids, teaching myself to type on the keyboard with Carmen Sandiego, Trying to install Yellow Dog Linux on my original iMac (Orange, if I remember right), to motorcycles (1970’s Honda CB360) and European cars (1980’s Saab turbo), HAM radio in college and bicycles.

Despite my lifelong love-affair with technology, including an engineering degree, I didn’t learn to code until my mid-thirties. I honestly found it tedious and frustrating, largely because I didn’t really have any mentors, except a few friends in college who were into gaming or file-sharing (sans license).

While, I knew how important & amazing computers were the ability to code a full program eluded me. Once I started working in my field, energy efficiency, I simply didn’t have enough time. Once I struck out as on my own as an independent consultant and shortly thereafter working for a startup company with a web-based product, where I had the freedom and time (not always paid ;-) to explore the wild world of software development and techniques.

Originally my intention was simply to replace my old tooling, largely based around Microsoft Excel and US Department of Energy (US DOE) software, with a data analysis and scripting language. I chose Python around 2012 and have never had a single regret. I’ve also dabbled with R from time to time and find that it fills a solid niche in my work as well but I’m especially glad that I chose to invest in the Python / Numpy / Pandas ecosystem. More recently I’ve made a push into web-development, motivated simply by the earnest wish to share my work and make it available for others, with the Flask web-framework.

Now, in the white-hot center of transitioning (maybe better described as a pivot, as I hope remain in my original domain of clean energy my career to software development & data-science), I am now fully committed to becoming a full-stack developer and data-scientist because I want to change the world and make money.

This seems like the hardest problem to solve and I will need the sharpest tools to do it. I’ve always been a avid learner–my thirst for knowledge is probably my greatest asset and I’ve always subscribed to the saying that, “It’s a poor craftsman that blames his [or her] tools”. I’ve had the foresight to select some solid tools, including Python & R, and I want to hone the craft of using them. That’s why I want to attend and / or mentor at a code school.

I was instantly attracted to Holberton school for several reasons. First the mission of diversity and community service speaks deeply to me, especially in these challenging political times and the regional disparities that we all know are unsustainable. I hope that I can become the change I want to see in the world and that my involvement with will leverage the network effects of what seems to be an ideal community for achieving it.

Go to the meetup page
Jun 6, 2017
State Department Cleantech Challenge - Code for San Francisco
The Data Science Working Group (DSWG) of Code for San Francisco (C4SF) participated in a “Data Science Challenge” the weekend of March 31 - April 2, sponsored by the US Department of State and Booz Allen Hamilton (BAH) at the Galvanize campus in San Francisco.

Our team consisted of eight members of the members of the DSWG: Catherine Zhang, Jude Calvillo, Anna Kiefer, Juliana Vislova, Cherie Meyer, Peter Welte, Tyler Field, Eric Youngson (myself) and one person from the San Francisco department of Environment, Imma Regina Dela Cruz.

All in all, this was a great experience; we made lot’s of great connections and learned a lot. As events go a data science challenge are pretty new. Basically a hackathon where, hopefully, raw data is turned into insights. Given the novelty of this new type of event there was some confusion over what the final product should be and what the role of data science is in the software development process. The tools are similar but the goals are somewhat different. In software development the end result is usually an application, typically a web-application. For a data scientist, the end result–or at least the goal–is usually some kind of predictive model.

The stated “Desired Outcomes of Solution” were as follows.
- Open source, user-friendly data science applications, algorithms and/or tools that allow users to identify where to build small-scale solar and micro-grids in
  Burma.
- The solution must have great data visualization and be accessible to non-technical users
- The underlying approach to the tool or application should be scalable, and may be applied to other
  countries and data sets
As with most data-driven projects, a big part of the challenge was in defining the problem to be solved and determining if the data we were provided was sufficient to solve the problem or more sources needed to be identified.

An additional challenge was that, although the criteria were clear enough, there were many subtleties to the problem that only came out through discussions with subject matter experts (SME) and during the demo presentation, where we presented our preliminary solution to the problem and elicited feedback from a panel comprised of the organizers and SMEs. For example, we assumed that we could rely on the government’s electrification plan to exclude areas where they were already planning to invest from recommendation to the private sector.

For data we were provided access to BAH’s “Sailfish” platform. Ultimately we explored several datasets found there but the available data were not sufficient for our analytical approach. The dataset we found most useful, the 2014[?] census–the first in thirty years, we found on the open web.

One aspect of this event that was conspicuously absent was any mention of the oppressed Rohingya minority, a group of about 9 million people, which was notably absent from the census data.

The challenge opened with a Happy Hour Friday night with opening remarks by Brian MacCarthy of BAH and a recording by Ambassador U Aung Lynn.

Over the course of the next day and a half we discussed our ideas with subject matter experts (SME) and presented our proposed solution to the problem, as we had come to define it, to a panel of SME, officials from the US State Department and employees of BAH. In addtion to government officials from both the US & Myanmar, several for & non-profit companies were included in the SME panel.
- BBOXX
- Intersect
- Infrastructure Development Company Limited
We were provided the data in the form of access to a special subdomain of BAH’s Sailfish platform put together specifically for this event. Unfortunately we weren’t granted access until Thursday, just the day before the opening happy hour. We made do with the circumstances, though, given the presentations started Sunday just after lunch.

First we started with a classic brainstorming session; locked ourselves in a conference room and set a time limit to start assigning responsibilities.

We identified (4) directions for exploration to inform our model.
1. Predict need for access to electricity
2. Predict willingness to pay for private investment
3. Identify renewable energy resource potential
4. Design user interaction
After talking individually with the SMEs and presenting to the panel, we realized that the energy resources are fairly evenly distributed throughout the countryside, however identifying households having all the indicators for enticing private investment are difficult to identify for a number of reasons.

The following census features as a household rate per township were selected by the team, in coordination with the panel, as available indicators of willingness to pay.
- Demand for electricity (mobile phone use as proxy for consumer demand)
- Employment status
- Housing type & ownership status
- Likelihood of being connected to the grid
- Current energy use: lighting used as proxy
First we decided to eliminate the areas identified by the central government’s planned grid development but we were told by the panel that we couldn’t quite rely on the national electrification plan and would need alternative indicators.

The main data-sets we relied on to determine the intersection of these attributes that would indicate conditions favorable to private investment were the national census data and the geo-spatial location of existing power lines.

We set to work exploring a predictive model for willingness to pay based on existing indicators in the census data such as the cost of lighting alternatives, assuming that they would prefer electric lighting as it is safer than most of the available alternatives.

Simultaneously we worked on mapping the distance from existing power lines that would contain enough townships to correlate to the current reported overall grid electrification for the country.

Future work remains to validate the predictive model using multiple years’ data and integrating the predictive scoring into the GIS visualization to create an interactive tool for guiding investment and quantifying it’s potential impact and revenue for investors.

Next Steps:
1. Rigorously explore variables in Census
2. Form a hypothesis for the ideal data that would provide prediction of interest & potential to pay
Go to the meetup page
Sep 22, 2016
Introduction
For my first post, I’m going to start out by describing my development environment and how I got to a setup I like. I’ve followed many tutorials and gone down a few blind alleys, so if you’re interested in following my work, if follows that we might save some time by getting the fundamentals setup out of the way. I hope this will This will save time and frustration for everyone involved down the line, or at least provide some amusement along the way.

First for the general editing tools. I use the Bash shell for running code, version control and networking. I use Bash, as opposed to other available *nix-based shells (Zsh, etc.) mostly because it is standard and available in most Unix & Linux environments. At first this may seem like a default position but I’ve come to philosophical perspective that for every default overridden, there is an associated maintenance cost, not to mention sometimes an additional learning curve–but more on this later.

I use Sublime Text for editing files and managing projects. I like Sublime mostly because it gets out of the way and lets me concentrate on the task at hand. The ecosystem is rich with plugins and options but I probably use a smaller subset of these than average. One of the features I rely on most is the “Vintage” mode, where I can use vim navigation commands. This is a simple feature but a powerful one for (at least) two reasons; first, I don’t need to use the mouse–granted this is not unique to either vim or Sublime Text but it is fundamental to my workflow–second, it keeps me in practice with my vim keyboard navigation commands. Vim may seem like an antiquated technology to some, even while taking the form of identity politics with others, but it’s real value to me is in it’s ubiquity. If you log into a server, it’s extremely likely that you will have vim at your disposal (and Bash for that matter).

Now for the language specific tools. My default language is Python. I chose Python for (at least) three reasons; first, it’s–you guessed it–relatively ubiquitous, second, it’s enthusiastically recommended in a wide range of development domains by a large number of users, including all the major domains in which I intend to work. Those being:
- Data analysis
- Scientific modeling & simulation
- Web development
- System administration & automation
Python setup for OS X
Go to the meetup page

eayoungs data science

News

How I Learned to Stop Worrying and Love the Code

State Department Cleantech Challenge - Code for San Francisco

Introduction