The Future of my Internship

(The start of something new.)

Officially I have a new project! Luckily it overlaps a bit with the work I’ve done on the shipping bug so I have an idea where to start. The project is about closing a small, rare gap in Bandcamp’s otherwise impeccable bookkeeping, having to do with recording the fee that PayPal charges on every sale. This information is retrieved from PayPal in the final stage of the express checkout (PayPal Express Checkout seen in My First Bug). In this stage we retrieve the fee amount along with some other details like tax amount and payment status by making the API call DoExpressCheckoutPayment. The problem is that under certain (previously) mysterious circumstances the fee does not get included in this information. From observation of the logs, it looks like when the payment status variable is ‘Pending’ then the fee isn’t returned. Kinda makes sense. Then by scouring the webapp logs we can observe what kinds of transactions are pending.

Look familiar?


As far as pending transactions are concerned, I’ve found that there are two reasons: echeck and unilateral. An echeck transaction means an electronic transfer, where the payment is made with a debit/bank card, and a unilateral transaction means that PayPal doesn’t have an account set up for the band yet. PayPal lets the transaction go through and holds on to the money until the band makes their account. Eventually when the payment is processed the PayPal fee is finally calculated. This can be days or weeks after the sale, and Bandcamp currently has no way to know what it is.

If you have your own band, you can look at your sales records by visiting your band’s page and clicking on stats in your admin menu bar at the top of the screen. Then select Sales/downloads and you’ll probably see a cool graph displaying all the instances of sales over time that will distract you, but what you’re really interested in is the Tools page link at the bottom of the page. On that page you can export any increment of your sales records into a csv file with all the nitty gritty details about every payment. Here is an example of what it looks like in a band’s records right now:

Screen Shot 2013-06-27 at 1.37.10 PM

See the two that I marked yellow? Those are eCheck purchases, for which we didn’t receive the fee amount. By the end of the summer those are going to be filled in!

PayPal does provide a mechanism for informing us of the final fee, called Instant Payment Notification (IPN). Bandcamp can ask PayPal to send this at some future point after its sale, and then the fee would be updated in the database. This is my project.

A couple ideas that sprung to mind about how to go about this, including investigating the PayPal fee calculation to perhaps calculate a temporary ‘expected’ fee until the real fee is calculated. Also, in the database table where the PayPal-specific data for a transaction is stored, I should add a status column indicating whether the payment  is ‘paid’ or still ‘pending’, if that’s not already noted. I definitely need to start looking into how the payment tables in the database are set up.

Some specific questions that come to mind:

  • How are we storing DoExpressCheckoutPayment results (what tables)?
  • Are we storing the status of the transaction?
  • What are the options for statuses stored?
  • Do we store the pending reason?

But before I got the chance to answer these questions, I very embarrassingly broke my development environment. Did I finally push my 5 year old laptop into mechanical failure or was my computer simply in need of a reboot? Tune in next post for the exciting conclusion!


Playing with SQL

(My favorite song from a friend’s band in Vancouver.)

Next up, I’ve been asked to figure out some statistics about trends in purchasing, specifically, what percent of purchasers in the last two years have only bought digital items? This is going to be an SQL query on one of the slave database servers named munda. The slave, munda, mirrors the changes made by the two master servers dwayne and morris. Managing multiple master servers by dealing with concurrency complications is handled by the Multi-Master Replication Manager for MySQL. The idea behind a slave is that it takes the pressure off of the master servers by responding to most of the read-only queries. These servers are really huge and always active. Hence, while I’m learning I need to be particularly careful when making a query on one. In fact, every time I wanted to run a query I had to send it to Joe and have it approved first.

This was a little bit intimidating because I didn’t get to show my boss the beautiful final product, but instead, had to advertise every mistake along the way. Although, I liked the process. Usually I’m used to trying out whatever idea my mind wanders to rather than logically stepping through each thought completely in an organized fashion. I really liked seeing how I properly problem solve step by step, and after a few show tables and describe commands, I had written my first query.

In order to make sure I wasn’t stressing the database, we had a look at a MySQL version of top called innotop to monitor the servers. From this you get information about cache usage, bit transfer and a list of queries. In addition this information is divided into queries per second (QPS), query cache hits (QCacheHit), use of the buffer pool cache (KCacheHit), bits per second incoming (BpsIn), and bits per second outgoing (BpsOut). All of these items are shown in a table of rows ‘Now’ and ‘Total’ so you can compare the current stress to the normal work the server performs. So, while running innotop in a neighboring window, I executed my first query:

FROM sale_items
WHERE sold_date >= '2013-05-31';

Below the innotop stats there is a list of the queries running at that moment and details about how long it has been running, and sure enough mine popped up there. After 30 seconds the color of the query changed to green and again changed after a minute to yellow. If the query has been running too long and looks like it is affecting the speed at which queries are being processed, you can kill it by typing ‘x’ and entering the query id. Joe had me practice killing queries by running


The payment_id query above didn’t seem to cause too much unpleasantness and finished in just over a minute. The result gave me a list of payment ids indicating all the payments in the last week. Next I tried this, still leaving innotop open:

SELECT COUNT(distinct paypal_auths.payer_paypal_email) 
FROM sale_items
JOIN payments     ON sale_items.payment_id = 
JOIN paypal_auths ON payments.auth_id = 
WHERE sold_date >= '2013-05-31';

Now I was getting something more interesting, a count of how many different buyers there were via joining all the appropriate tables. This query ran a little bit faster and didn’t quite graduate to the yellow state. Next I started thinking about how to distinguish between types of sales. The sale item records are tagged with an item_type, but I didn’t know what the values could be:

SELECT item_type 
FROM sale_items 

From the semi-random result I discovered that ‘a’ meant album, ‘p’ – package, ‘t’ – track, and ‘s’ – service (like subscribing to Pro accounts). Then I realized I wanted to be executing two different queries, one for discovering how many distinct buyers have purchased just a package:

SELECT COUNT(distinct paypal_auths.payer_paypal_email) AS packages_sold
FROM sale_items
JOIN payments     ON sale_items.payment_id =
JOIN paypal_auths ON payments.auth_id =
WHERE sale_items.sold_date >= '2011-05-31'
    AND sale_items.item_type = 'p'
    AND sale_items.state = 's';

and one for how many buyers of any kind of music (i.e., not a service):

SELECT COUNT(DISTINCT paypal_auths.payer_paypal_email) AS albums_tracks_packages_sold 
FROM sale_items
JOIN payments     ON sale_items.payment_id = 
JOIN paypal_auths ON payments.auth_id = 
WHERE sale_items.sold_date >= '2011-05-31'
    AND sale_items.item_type != 's' 
    AND sale_items.state = 's';

From these two queries I was able to obtain enough information to answer my original question. Of course for confidential reasons (and fear for my life) I can’t say what the result was, but it is awesome.

I think I have an idea about what I’ll be doing next week, but let’s leave that as a cliff hanger.

Fixing the Bug

(Check out Bandcamp’s new embedded players!)

Finally it’s time to fix this shipping bug. So mainly what I’ve got to do is fix the receipts sent to both buyer and seller by modifying the liquid templates. In the buyer receipt I need to add the mailing address and provide a means of contacting the artist if the mailing address needs to be changed. Through a bit of manipulating and playing with stylistic changes, this is what a receipt now looks like (my changes are shown in the 2nd gray box):

Screen Shot 2013-06-10 at 6.45.02 AM

In the seller receipt I need to add the sku (an inventory id for packages that can be turned on by the merchant) and link the band’s name to their home page. This was a little more complicated than just updating the liquid files because I needed to figure out where to find the sku in the database. Turns out it’s stored in three tables: packages, package_options, and sale_items. In theory the sku number should never be different from table to table, but since the band can change the sku at any time, it’s best to fetch it from the sale_items table. The sale_items table takes a snapshot of the package information when a sale is made, this way the information is identical to what the user saw.

The liquid template takes a ruby hash called items with the receipt information, and the hash is filled in from a large query on the database. So in order to get the sku, all I had to do was adjust that query. The query was already selecting from sale_items but the sku is stored as an id, not text, and the id corresponds to an entry in another database table called sale_item_syms, which maps the id to the actual text. This is a clever way of saving storage space in the database by only storing a string once. Therefore, after adding a join in the query with the sku sym_id, the items hash now has the sku text and is available in the receipt template. Check out the linked band name and sku now seen in the seller receipt:

Screen Shot 2013-06-19 at 7.57.37 AM

And that’s it! Internship over. Just kidding.

The Race Condition

(Still not exactly sure what cupcakin’ means but super catchy nonetheless.)

From what I’ve come to understand, a race condition is a term used in computing when a time delay in operations could result in a different outcome. I was first introduced to this idea last Tuesday when I went to Bennington College to watch my friend Alexa give her final project presentation for the class Data Structures in C. Her project was an image-processing command line tool (for her astrophysics research) using threads and a queue data structure. To demonstrate its functionality, her program read in a large image and inverted the pixels. The threads were used to perform the pixel inversion on multiple bands of the image at the same time, greatly decreasing the run time. At the end of the presentation, Alexa cheekily announced, “Wait! I forgot to tell you about my race condition.” She had noticed that two threads could be competing for the same band of the image in the queue. In the time it takes thread 1 to check if the queue was empty or not and dequeue a band, thread 2 could also check the queue and dequeue it, and this would cause a crash if it was the last band. One of the parts of the project was that she made the queue thread-safe with a mutex, but she realized that the check (queue empty?) and the dequeue were in different areas. She jokingly said a solution to this might be to make the threads perform one at a time, but that would defeat the purpose of using threads in the first place.

Coincidentally, the next day at work I started writing my first piece of code for Bandcamp and I discovered I had a race condition of my own.

Liquid is the name of a templating language Bandcamp uses. It can be used similarly to other web scripting languages that are integrated into the HTML of a page. Liquid enables us for example to easily send email receipts with a fixed format, but with different information in each email. Liquid is the code that I’m going to need to change if I want to add anything to the email receipts. But first I’m going to need a method to test the changes I’m making.

This is where the local version of the Bandcamp website that I installed on my first day will come in useful. Since my local site has no users, we use a part of the Bandcamp codebase called Faker.rb to quickly generate a test band account, fan account, both, or make anonymous sales. In order to test the receipt that would be sent out, first I would need to generate a fake band and then create anonymous sales for it. But, the code for making a fake band just creates digital albums for it, and I need to test shipping addresses, which means I’m going to need to make sales for merchandise. You know what that means: I get to write some code!

By looking at the ruby code in the Faker class that makes albums (and a little help from debugport, a sort of irb backdoor to a running ruby program), I was able to create a similar function called make_package that makes a fake piece of merchandise for the first album it makes — a t-shirt!

Screen Shot 2013-06-09 at 10.45.58 AM

In the test interface it worked! But the code wasn’t quite right yet.

Essentially all that this code needed to do was update the two database tables where the package information is recorded. By the way, something I just learned is that inserting strings taken directly from the user can be dangerous! Certain key characters in SQL like , , and =, if directly inserted into a SQL statement, can alter the query and be used for nefarious purposes. This is called a SQL injection attack. Scary, eh? I had never even considered this could become a fault, let alone be a security problem. To remedy this, there is a way to escape special characters in SQL using the backslash character to turn special characters into literal strings. So in practice, all variables should be ‘escaped’ before being inserted into a SQL query.

With the escape statements inserted, now my SQL queries are safe and I’m getting the output I want. Then I found my race condition. So, just for fun, here is my code. See if you can find the race condition — think about what could happen in the database if two processes call make_package at the same time.

def make_package(album_id, band_id)
    q_band_id, q_now = SQL.escape(band_id,
    title = "#{@key} fun t-shirt"
    q_title, q_album_id = SQL.escape(title, album_id)

    # create package
        INSERT INTO packages
        SET title = #{q_title}, type_id = 11,
        new_date = #{q_now}, mod_date = #{q_now},
        shipping_local = 10, shipping_regional = 10,
        shipping_intl = 10, fulfillment_days = 1, private = NULL,
        band_id = #{q_band_id}, is_set_price = 1, price = 10,
        quantity = 20, sku = 'PFB6-PFA1-PFTS',
        new_desc_format = 1, album_id = #{q_album_id}

    # generate new index for this package (where it is in the list)
    index = SQL.query("
        SELECT MAX(tp.index) AS max_index
        FROM tralbum_packages AS tp
        WHERE tp.tralbum_id = #{q_album_id}
    ").collect.first.max_index || 0
    index += 1
    q_index = SQL.escape(index)

    # create association between album and package
        INSERT INTO tralbum_packages 
        SET tralbum_type = 'a', tralbum_id = #{q_album_id}, 
        package_id = #{q_package_id}, `index` = #{q_index}

Find it? Look at where I first assign the variable index using a SQL query. If there is more than one item being sold that is associated with the same album, it needs to have a different index. So, I grab the maximum index in one query, add one to it, then insert a row using the new value. The race condition is this: an index may be duplicated if a package is added to or deleted from the album in the time it takes this code to update the index, escape it, and perform the SQL insert statement. Using this code you could either insert an index that is too big or one that already exists in the table. You have to consider that a huge number of changes can occur between every line of your code. In this particular case the race condition isn’t much of a problem because this is only test code that’s used in the development environment where there isn’t exactly heavy traffic. But it is important (and pretty cool) to be able to recognize these types of bugs.

Terrific! Now that I’ve updated the testing functions I can test the code I will write to update the receipts. Wish me luck!

My First Bug

(Here are some sweet tunes to listen to while you’re reading that will come up later.)

Something exciting about my work at Bandcamp thus far is that I’m rarely sure what I’m going to do next. The tentative plan coming in was to help one of the engineers, Michael, with his corner of Bandcamp: fan accounts. But that idea kind of dissolved when Michael became busy with orienting a new hire. So Joe has some project ideas that will be reasonable for someone with my experience and limited time here, and will also teach me a lot in the process. This is where I’d like to introduce my first bug, the shipping bug.

The fix that I had to make was to add the shipping address to the email receipt that a fan receives when she purchases merchandise, like an LP. Yep, that’s it. Although there is some more debugging that could be done. Several artists and labels had suggested this  because on occasion buyers weren’t receiving their purchases, and they’d find out later it was because items were being sent to the wrong address. By including the shipping address in the receipts, fans could double-check and contact the band to make corrections before it’s shipped out. A seemingly easy change, yes, but we also wanted to see if there was a reason the wrong address was being used in the first place.

The shipping bug was initially not even a bug at all, more of an edit to a print line. But nonetheless, it was an important enough glitch to be listed as a bug and logged in FogBugz. Fogbugz is a web tool based on the understanding that code is always evolving, and its vision is to provide a platform to support that reality. Buggy code is simply a fact of life and bugs are created even by the best programmers. Even if you write a beautiful program that fits the specs exactly and minimizes time and storage use, the technology and the program’s functionality will always be changing. Continually making improvements is what keeps a product competitive and desirable to consumers. So before I could mark this bug as ‘resolved’, I needed to know more about how Bandcamp processes sales.

Bandcamp uses PayPal to deal with payments, and all transactions are processed with PayPal’s Express Checkout. PPEC is a pretty well-written program that communicates with your website in order to exchange necessary information. The relationship is shown in the diagram below (courtesy PayPal’s website). First the request from our website is made to PayPal (PP) and a token is returned to indicate PP is ready for the transaction. The user is then redirected to the PP payment pages to login or enter their credit card info and the payment information is recorded. Then the user is returned to Bandcamp and Bandcamp sends a request to PP to make the payment. When that succeeds a final request is made to PP for the API operation GetExpressCheckoutDetails to retrieve all the details about the transaction. Finally, Bandcamp puts the information together in a receipt email. Phew.

(Sandbox is the testing interface for merchant sites)

(Sandbox is the testing interface for merchant sites)

Lucky for us PPEC is smart and knows that the user needs to enter a shipping address if they are ordering something physical. If you have a PP account you already have a billing address on file, but at payment time you can enter a different shipping address. Having two addresses, the suspicion was that maybe Bandcamp was grabbing the billing address instead of the shipping address and this would mean that packages would be sent to the wrong place, in the cases where the billing and shipping addresses were different. So, an investigation ensued! I spent a few hours exploring the information that PP sends back to Bandcamp in the GetExpressCheckoutDetails call. Turns out there’s a lot.

It was time to view some live transactions on one of the servers. Something I didn’t mention last post was the distributed system that Bandcamp has. They divide the work on the servers (located mostly at a colocation facility in Texas which no one at Bandcamp has ever been to) into eight major processes:


The DAEMON is a process that works on its own to provide support for the custom distributed file system, but the rest of them work directly because of buttons you push on the website. In particular, when an artist uploads a song or a picture an UPLOADER works to collect it and place it in the file system. A TRANSCODER turns an uploaded audio file into different formats which a DOWNLOADER can then retrieve for download or streaming.

Bender1 (all of the servers are named after Futurama characters) is one of the many WEBAPPs. So we pulled it out of rotation, I ssh’d into it and we started looking through the log file (using less and grep) to find some merchandise purchases, noticing the kinds of shipping fields that are filled in when merchandise purchases are made. They look something like this:


Looks like quite a bit of nonsense and you’ll also notice a lot of the information is duplicated, but everything we need should be there. After looking at several of these and referring to the PP docs, we determined that they were consistently returning only one address. So, I tried to recreate the bug by making some transactions myself where my own shipping address was different from my billing address (using the out-of-rotation server so my actions were easy to follow in the log). As a test, I bought the LP of “The Cards” by Bevel Summers. (REALLY good album by the way, highly recommend it.) Then I checked out the logs again and we did in fact receive the correct address from PayPal.

From these tests we concluded that Bandcamp was always using the shipping address that the user specifies. No bug in the Bandcamp code, and showing in the receipt where the package will be sent seems like a reasonable solution to the problem. So, the next step is learning Bandcamp’s templating language liquid to change the email receipt.

And so it begins

First Day

(Feel Ma Peine was the first song I ever heard on the BC Weekly and it was love at first listen!)

The funny thing about this post is that it was meant to be finished on my first day and it is now day 4. There are many excuses, but the overlying reason is that I hadn’t completely finished thinking about how I wanted to write this blog. It turns out it was more simple than I had imagined. Rather than reflecting on what I had learned at the end of the day and succinctly composing lists about how to be an intern, I just need to be an intern. And in this state of being, I think these posts will communicate best. The tips and advice should be read between the lines while I discuss my day-to-day learning. In the words of my boss Joe, “Keep it contemporary.”

Day 1 was a big deal. Mine felt especially so, not just because it’s the first day on the job, but because a) I am the first intern that Bandcamp has ever had and b) I came all the way to Vermont for two weeks of orientation and friend visiting! Props to Bandcamp for putting that together.

Joe’s plan is to spend a lot of time one-on-one getting me set up and familiar with Bandcamp development. The day started out fairly easy going. First, setting up at South Street cafe (not to be mistaken for Second Cup as the Canadian in me always does), we began customizing my laptop for the months of work ahead. I was going to have a computer that runs its own local version of the entire Bandcamp website.

Once confirmed that I had Mac OS X 10.8.x and ruby 1.8.7 as the default, I got access to the Bandcamp subversion repository and checked out a magical ruby program called setup.rb that updates and installs everything needed for Bandcamp, including software like

just to namedrop a few. Honestly, from this list, I only have experience with python and MySQL and know very little about the rest of the items. Turns out there are a lot of tools involved with running a website like Bandcamp. From 5 minutes of googling I have discovered that SoX, lame, ffmpeg, and FLAC are all involved with streaming and encoding audio files. I’m not sure how much I will be interacting with all of these tools, but hopefully I’ll learn a little bit more about the roles they play and have a better understanding of the complexities involved with running a music platform.

Half way through running setup.rb we relocated to another cafe. There we had a look at some live stats using commands like netstat and top. So since I’ve mentioned the command line, the one thing in particular that I wish I’d spent more time on in life is playing with and setting up Terminal. Joe set me up with a few shortcuts that he uses, like:

alias t='cd ~/bc/bandcamp/trunk'
alias l='ls -lFG'
export LESS='-fMnQRSPw%f line %lb'

Knowing useful tools on the command line would make my quality of life 100,000,000,000,000,000 times better. And I’m not even exaggerating. In particular, a command that I hadn’t used but Joe has insisted I become best friends with is less (notably instead of using vi to look at files). Also being more than just competent with vi would be a good idea. Hmm, I have an exciting few weekends ahead!

Over lunch Joe and I discussed a rather controversial topic about gender inequality in the workplace in technical fields and tossed around ideas for why this might be. This topic has been very interesting for me, especially in the last year when a friend and I began a Women in Engineering and Computer Science group at UVic. Our conversation developed into talking about outreach and how to get started in tech. Then a light bulb went off and the Bandcamp Intern blog was conceived.

After lunch setup was complete and I was able to run a copy of the site on my laptop. I set up a band account here and also on the website in production to use for testing. I put in a sound bite and everything!

Day 2

Day 2 began with the result of the conversation that we had at lunch on Day 1. We had had a really interesting discussion about what it means to be an intern at Bandcamp and what sort of responsibilities that includes, so it was a really unusually obvious direction to take when we decided to document and publicize the experience. Essentially today was allocated to becoming an internet writer, and it really turned out to be the perfect day for that task.


After some morning coffee from Second… er, South Street, I headed to the beautiful  campus of nearby Bennington College. There I went straight to the library and set up irc, which Bandcamp uses for almost all its daily internal communications, and started reading. Turns out there are some beautifully written blogs in the Computer Science field, and there are also some really boring ones. Maybe with more experience I will appreciate them more, but the bloggers that really stood out for me were Steve Yegge and Joel Spolsky. In particular, there was one post that Yegge wrote that I had read last year during my previous internship (on a recommendation) and hadn’t remembered till after I had read it all the way through and fallen in love with the article all over again. This is a MUST read.

Later in the day, Joe and Ethan (one of the founders of Bandcamp) were investigating an odd sales dip on the previous Sunday. Joe was looking at admin pages full of stats and graphs and at nagios, that display all sorts of information about what types of things are being purchased, from what referrer the customer came from, whether the error occurred during the purchase, etc. and asked me to help out.

Something that you should probably know about me is that python is my main squeeze. It’s the language I’m most comfortable programming in, and I was excited when Joe asked me to process some data queried from their database and analyze it because I then had the opportunity to use python’s SciPy library. Unfortunately, I wasn’t prepared  and I didn’t have SciPy installed. So, when it took me more than 30 minutes to try to install the library, I had to resort to Excel. The plots turned out interesting and our results concluded that it was most likely a coincidence with no direct cause, but I should set up SciPy for next time.

The rest of the week seems to be focused on one bug in particular that I have been given the official authority and responsibility to solve, and that certainly deserves its own entry.

What I wish I knew 3 years ago.

If you could go back to your first job interview, nay, to the first time you read a job description and thought “Shucks, this job sounds amazing, but how do I even get started to ever being considered?”, what would you tell yourself? What do you wish you knew to have the confidence to get you to that dream job? For me, that experience was three years ago, when I had just finished my first year of Computer Science at the University of Victoria.

Three years into my university career I’d finally started something that I could see myself doing for the rest of my life. Three years in and I was sick of working at supermarkets and coffee shops. I was ready for something applied, something that involved problem solving, and something that was actually challenging. I wanted an internship. With one year of ‘relevant’ experience and some mustered, almost artificial, confidence, I set off applying for all the most exciting technical positions I could find.

Somehow my confidence paid off and I received the phone call for a job offer at an astronomical observatory in Hilo, Hawaii. So, doing what? I had no idea. Regardless, I screamed in the middle of the campus cafeteria when I found out. Kid you not. My initial reaction expressed my emotions fairly obviously, but then the idea of performing in a job position at such an awe-inspiring institution sunk in.

At first I was perplexed: “Why would they choose someone so inexperienced as ME?” then horrified: “What on Earth am I working on for four months and how can I even pretend to be prepared for the first day?” then miserable: “I’m going to f*$% this up so badly.”

Regardless of this panic attack, I didn’t fail. And, after the first four months I was hired on for another internship the following year. So, how did I do it? How did I perform a successful internship without knowing what to expect? And what would I tell myself three years ago? This question is one that my new boss at Bandcamp proposed on my first day. So, in the following posts, I’m going to try to make technical positions feel more accessible to the novice engineer. By documenting my experience, I will hopefully provide some insight into what a beginner can expect, in particular thinking about what would benefit someone like me three years ago.