(Good, local music.)
Time to investigate this problem with the PayPal fees! First question is where to get started. My first thought is that I need to understand what the tables look like and figure out how Bandcamp currently processes the fee.
Usually when I start a project in school, I open a new file and begin writing code before I even properly understand what my objective is. Poor practice, I know, but with only my future self to worry about it’s hard to resist writing code straight away. This simply isn’t an option when working with an existing codebase for a company like Bandcamp because everything is integrated. So, I need to begin each project by learning a lot and figuring out exactly what I’m doing. When I first started looking into this problem there were a few questions that were getting me started with my research. Though, upon further investigation I think I came up with more questions than answers. I began with figuring out what the data that needed processing looked like.
When we get the PayPal fee from DoExpressCheckoutPayment, we store the information in a table called paypal_auths (see below). This table doesn’t necessarily store every detail about the transaction, but it does store a payment status and transaction ID. In fact, it stores 2 statuses, one simply called status that Bandcamp assigns to the payment and one called payer_paypal_status which PayPal provides. I will need to query the tables further to really understand what those mean. The transaction ID is useful because when PayPal sends you the IPN (instant payment notification) after the transaction has completed, you can use that id to update the table. We know that the PayPal fee comes from this magical IPN signal, but I don’t know how to receive it or what it looks like. Apparently one of Bandcamp’s engineers, Robbie, had already started the movement towards grabbing the IPN notifications, so there is a collection of code somewhere that I need to find that will eventually (with my help) get the fee information for a transaction that was previously pending. Just to get an idea of what the data we store looks like in paypal_auths, here is what a table describe spits out:
+-------------------------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------------+------------------+------+-----+---------+-------+ | id | int(10) unsigned | NO | PRI | NULL | | | status | varchar(20) | YES | | NULL | | | checkout_date | datetime | YES | MUL | NULL | | | order_total | float | YES | | NULL | | | item_name | varchar(127) | YES | | NULL | | | item_desc | varchar(127) | YES | | NULL | | | is_set_price | tinyint(4) | YES | | 1 | | | payer_firstname | varchar(40) | YES | | NULL | | | payer_lastname | varchar(40) | YES | | NULL | | | payer_business | varchar(127) | YES | | NULL | | | payer_paypal_id | varchar(13) | YES | MUL | NULL | | | payer_paypal_email | varchar(127) | YES | MUL | NULL | | | payer_paypal_status | varchar(10) | YES | | NULL | | | payer_countrycode | varchar(2) | YES | | NULL | | | paypal_token | varchar(20) | YES | MUL | NULL | | | paypal_correlation_id | varchar(13) | YES | | NULL | | | paypal_api_version | varchar(8) | YES | | NULL | | | paypal_build | varchar(12) | YES | | NULL | | | ip_address | varchar(39) | YES | | NULL | | | payment_date | datetime | YES | MUL | NULL | | | fee_amt | float | YES | | NULL | | | paypal_transaction_id | varchar(19) | YES | UNI | NULL | | | paypal_payment_date | datetime | YES | | NULL | | | paypal_payment_type | varchar(7) | YES | | NULL | | | currency | varchar(3) | YES | MUL | USD | | | usd_total | float | YES | | NULL | | | quantity | int(11) | YES | | NULL | | | sub_total | float | YES | | NULL | | | shipping | float | YES | | NULL | | | tax | float | YES | | NULL | | | payer_shiptoname | varchar(60) | YES | | NULL | | | payer_shiptostreet | varchar(150) | YES | | NULL | | | payer_shiptostreet2 | varchar(150) | YES | | NULL | | | payer_shiptocity | varchar(60) | YES | | NULL | | | payer_shiptostate | varchar(60) | YES | | NULL | | | payer_shiptozip | varchar(20) | YES | | NULL | | | payer_shiptocountrycode | varchar(2) | YES | | NULL | | | payer_shiptocountryname | varchar(60) | YES | | NULL | | | payer_note | varchar(255) | YES | | NULL | | | ship_date | datetime | YES | MUL | NULL | | | email_date | datetime | YES | | NULL | | | ship_notes | text | YES | | NULL | | | payer_shiptophonenum | varchar(30) | YES | | NULL | | | item_price | float | YES | | NULL | | | discount_code_id | int(10) unsigned | YES | MUL | NULL | | | seller_paypal_email | varchar(127) | YES | | NULL | | | paypal_error_code | varchar(12) | YES | | NULL | | | item_detail | varchar(50) | YES | | NULL | | | band_id | int(10) unsigned | YES | MUL | NULL | | | account_id | int(10) unsigned | YES | MUL | NULL | | | share | int(10) unsigned | YES | | 0 | | | item_title | varchar(300) | YES | | NULL | | | item_title2 | varchar(300) | YES | | NULL | | | item_credit | varchar(100) | YES | | NULL | | | share_balance | int(11) | YES | | 0 | | | currency_rate | float | YES | | NULL | | | fan_receipt_date | datetime | YES | | NULL | | | band_receipt_date | datetime | YES | | NULL | | | preorder | tinyint(4) | YES | MUL | NULL | | | release_receipt_date | datetime | YES | | NULL | | +-------------------------+------------------+------+-----+---------+-------+
And looking into the statuses mentions above, there seem to be 8 status options and 3 payer_paypal_status options by executing a couple simple queries:
+----------+----------+ | status | count(*) | +----------+----------+ | checkout | 7793 | | declined | 145313 | | error | 38976 | | finish | 517 | | ignore | 1424 | | paid | 5422767 | | refund | 1616 | | start | 2539145 | +----------+----------+
+---------------------+----------+ | payer_paypal_status | count(*) | +---------------------+----------+ | NULL | 2561365 | | unverified | 2734676 | | verified | 2861602 | +---------------------+----------+
From a look at the source code it seems like the status should only be ‘paid’ if the transaction is complete. Although there are payments with both null and zero values that are indicated as ‘paid’. Also, looking at the attributes of the table paypal_auths, there doesn’t seem to be any obvious indication when a payment is pending. It would be useful to know when that is the case.
After this initial investigation into changing the PayPal fee, I hadn’t even started thinking about writing any code and it had become obvious that this task was going to be a lot more work than that of my first bug. So, Joe decided it would be a good idea to flush out a proper plan and put it up on Bandcamp’s wiki page. I came up with a list of steps that I thought needed to be done in order to complete this task. Then Joe and I set up a meeting to polish the plan and divide the steps into weekly goals.
In school, we are introduced to planning in our first year by being encouraged to make flow charts of our programs before we begin. Unfortunately, the lack of persistence in encouraging planning in the classroom results in students easily ignoring design until well into writing the code. In other engineering disciplines, most of their degree is spent on planning, but software engineering is so accessible and is such a fast moving industry that planning is often ignored and quality, lacking. In conclusion, make a plan first.
In the phone call, we discussed several key points to creating a good plan. The first mentioned above is having distinct goals within a plan. You are essentially giving yourself a set of accomplishments that you can hand in to your supervisor and say, “look what I did!” A plan, on the other hand, will provide the steps to help you get to each of these goals. It is important to set your goals in increments of similar difficulty, and this is where you need the skill of ‘sizing’ your work load. It can feel a lot like guess work to give an estimate of how long a project will take, in particular, how much time debugging will take. But, the more specific and detailed your plan is, the easier it will be to stay on track. The tears and hours spent over debugging poorly written code is pretty much a right of passage in your undergrad. Although you can quite easily avoid that stress by designing yourself a detailed timeline for solving the problem. In this case, I divided my work into weekly increments.
Finally, we talked about the kind of detail that I would need to discuss my objectives in the wiki. Since Bandcamp employees are spread out across the world, the emphasis on good communication is very high. This means that we don’t want to alienate different members of the team by using very specific language, but still fully communicate the idea. We came up with a persona who I would essentially write to any time I had to explain something, and the specific person in mind was my friend Alexa. She is a very bright astrophysics major (recently graduated!) who has an interest in programming, but doesn’t have a ton of experience. I will need to be able to explain to her what I’m trying to accomplish, how I’m going to do it, and what the project will look like at the end of each milestone, especially what will not be accomplished yet.
Here is what we came up with:
The first week looking into this problem will involve interpreting the values inserted into the table paypal_auths. The values inserted can be understood via notes in the code and the documentation on retruned values from PayPal. Instances where there is no value or the value is zero, there will be more problem solving involved for values like the fee, status and type. By the end of the week I should have a write up describing what the important values are, what it means when they are missing, how to use them, and how they are inserted into the table.
GOAL: Understand PayPal transactions
This is the first of two weeks that will be investigating how to receive the IPNs sent by PayPal. No coding will be done this week. All efforts will be focused on understanding the content of an IPN and how bandcamp currently interprets IPNs. Documentation of all findings will be provided at the end of the week.
GOAL: Preparing to get IPNs
This is the second week of the IPN investigation and we should know enough by now to write the code to read in the IPN sent when the transaction has finally completed. At this point we should know what the code needs to do and where the changes need to be made. No interpretation of the data will be done, simply reading it in and printing it to the logs.
GOAL: Get IPNs
Focus on altering the paypal_auth table this week. I will add two columns that will include the pending payment reason and the date of the completed transaction. In order to make a smooth transition, transactions previous to the date these columns are added will need to be filled. Whether the entry is null, 0, or a value will need to be determined this week.
GOAL: Migrate paypal_auths table
This week we combine the efforts from the results of week 3 and 4 so that the IPN that we are now receiving is inserted into the improved paypal_auths table. In order to do this, we will need to see where in the code insertions into the paypal_auths table are occuring and will need to update the code accordingly. At the end of this week, everything back end should be doing what it is supposed to, including storing the information we need to update the bands’ sales records.
GOAL: Update database with info returned from IPN
Final week: actually getting the updated fee information to the UI. The value of the PayPal fee should directly translate from the database since no new information is needed, but how the data is pulled to the CSV will need to be understood prior to this update.
GOAL: Update CSV
I’m feeling pretty good about keeping on track with this plan, but you never know what I might run into. Keep an eye out for more blogging about my progress throughout the next 6 weeks!