Nulls and Zeros

(Discover some really great music by listening to the Bandcamp weekly!)


Although we know that pending transactions will not have an associated fee value, we are a little uncertain about what a pending transaction will look like in the sales records displayed for a band. Currently we get three categories of fees: ‘0’, NULL, or some float value. So the main question is what does it mean when there is a zero or null value in the table? In the database there are many instances of both null and zero fee values:

+---------+--------+
| fee_amt | count% |
+---------+--------+
|    NULL |   33.7 |
|       0 |    2.6 |
+---------+--------+

What we want to know is if this reflects the values we are seeing in the exported CSV files.
Well, first off, what does it mean to have a value inserted into the paypal_auths table where the fee value is stored? Everytime a payment is initiated and information is recorded, then an entry must be made in the table. This means that if someone puts an item in their cart, clicks ‘checkout’, and enters their personal information, there will be a record of that before the actual payment is processed. At least in cases where the transaction is terminated after this initial step, no PayPal fee will have been calculated because no purchase will have been made. In order to identify these kinds of payments I looked at the types of status values that were assigned to each payment:

select status, count(*) from paypal_auths group by status;
+----------+--------+
| status   | count% |
+----------+--------+
| checkout |    .09 |
| declined |   1.78 |
| error    |    .48 |
| finish   |    .01 |
| ignore   |    .02 |
| paid     |  66.46 |
| refund   |    .02 |
| start    |  31.15 |
+----------+--------+

Turns out these are values that are assigned to the transaction depending on how far through the code the transaction has hit. So we have 8 different statuses for payments. The start, checkout, finish and paid statuses represent stages of the payment with respect to the site’s interaction with PayPal and are assigned in the code. The declined and error statuses are given to incomplete payments for errors to do with the PayPal side of things, while the refund and ignore statuses, used sparingly, are manually set by BC to payments that should not show up as paid.
There is also a status called payer_paypal_status with three options: verified, unverified, and NULL. I was interested what the two types of statuses have in common. What stands out is that where the PayPal status is verified, there are no ‘start’ statuses. So in order for a payment to become verified it must have at least been initiated by the Bandcamp site and sent to and returned from the PayPal site before executing the payment. Good to know, but not really that helpful for us. Since the PayPal statuses are less detailed I decided to focus my attention on the null and zero values with respect to the Bandcamp set status. First, looking at the status options for payments that had 0-valued fees,

SELECT status, COUNT(*) 
FROM paypal_auths 
WHERE fee_amt=0 
GROUP BY status;
+--------+---------+
| status | count%  |
+--------+---------+
| error  |    .003 |
| ignore |    .07  |
| paid   |  99.55  |
| refund |    .38  |
+--------+---------+

then null-valued fees.

SELECT status, COUNT(*) 
FROM paypal_auths 
WHERE fee_amt IS NULL 
GROUP BY status;
+----------+---------+
| status   | count%  |
+----------+---------+
| checkout |    .29  |
| declined |   5.31  |
| error    |   1.42  |
| finish   |    .02  |
| paid     |    .002 |
| refund   |    .002 |
| start    |  92.95  |
+----------+---------+

A bit more interesting, so I looked at each status and how they were associated with fees based on three different categories: 0, null, or other fee values. In fact, in several states there was never a fee amount that wasn’t 0 or null. For example, ‘checkout’, ‘declined’, ‘finish’ and ‘start’ only ever have a fee of null. Those with ‘error’ status can either be zero or null, but more notable is that the only statuses that can have a value which isn’t null or zero are ‘ignore’, ‘refund’, and ‘paid’. (Promise I’m going somewhere with this.) These are important because ‘paid’ is the final stage of the transaction where you would expect the fee to be filled in and ‘ignore’ and ‘refund’ are the statuses that are changed manually by Bandcamp employees in unusual and uncommon circumstances. Hizzah! Things are starting to make sense.

Screen Shot 2013-07-16 at 2.58.54 PM

We’re about half way there. Here is a bit of comic relief for your patience. (referencing the xkcd comic I linked in ‘The Race Condition’)

Next step: how does this relate to the values seen in the CSV records? Well, from above, lots of paid transactions have fee values of zero (205267), but very few are null (44). Though in the CSV files it would seem that there are quite a few nulls, they even seem to appear more frequently than zero values. Ideally there should only be ‘paid’ transactions in the sales report, so what’s going on there?

Fairly randomly selecting attributes to look at, I queried for the countries that the paid transactions with null fee values were coming from. That didn’t show anything unusual, so I looked at what types of purchases were being made and both digital and merchandise orders were culprits, so also not unusual. Though considering there are millions of transaction records in the database, it is fairly unlikely that the 44 null values with a paid status represent the null values in the bands sale records so investigating them further would surely be fruitless.

I started looking at specific column values for any null fee values in the table. I ended up querying the payment types and payment date, which, apart from one entry, were both always null when the fee value was null. Looks like the null payment fees come from dud payments!

SELECT paypal_payment_type, paypal_payment_date, COUNT(*) 
FROM paypal_auths 
WHERE fee_amt IS NULL 
GROUP BY paypal_payment_type, paypal_payment_date;
+---------------------+---------------------+--------+
| paypal_payment_type | paypal_payment_date | count% |
+---------------------+---------------------+--------+
| NULL                | NULL                |    100 |
+---------------------+---------------------+--------+

Alright, now we’re getting somewhere. So null fee values are essentially incomplete transactions that will never be seen in the UI. This means that only 0 and float valued fee entries will show up in the table. It would appear that the problem is somewhere in the translation between the database tables to the CSV. So I went to investigate specific examples from the CSV where the fee value was null. Perhaps the missing fee value has something to do with inserting the values into the CSV. Using an email from a band’s sales record and the date of the transaction I searched the table:

SELECT fee_amt, paypal_payment_type, payment_date 
FROM paypal_auths 
WHERE payer_paypal_email='me@bandcamp.com' 
    AND status='paid' 
    AND payment_date='2012-03-04 22:30:26';
+---------+---------------------+---------------------+
| fee_amt | paypal_payment_type | payment_date        |
+---------+---------------------+---------------------+
|    0.36 | instant             | 2012-03-04 22:30:26 |
+---------+---------------------+---------------------+

A fee does exist! So I checked another email address where the fee was null. A fee value also existed for that buyer. Though, looking at instances where the fee value was zero in the spreadsheet, there was also a value of zero in the table. So the zero value made sense but the null value didn’t yet. Also notable from these investigations was that of the two types of payments, instant and echeck, all echeck payments had a fee value of 0. This means that once we start receiving the IPN, then those 0 entries should slowly turn into accurate values. But what about the null values we were seeing?

Let’s check out some more null payments on the CSV. While blankly staring at an excel spreadsheet of existing sales records, I started noticing that whenever there was a group of null fee value table entries, the same buyer email was listed for each. This meant that someone who was making a lot of purchases was not receiving the fee amount for many of the purchases. WOOAAAH, I think I know why the PayPal fee doesn’t show up! All the purchases were made at the exact same time, indicating that only one payment was made. Upon checking the sales records again it looked like a fee value was shown in the correct field for only one of the purchases made during the single payment. This should be consistent with all purchases of 1 or more items since Bandcamp only lets you make a payment with items from the same artist if multiple items are in your cart. This means that the Null value in the sales records isn’t really a value at all, but more of a placement filler for large order purchases. See an example below.

Screen Shot 2013-07-16 at 9.14.33 AM

We probably won’t change the null values in the sales records because technically all the information is there, but we may add some notes in the Bandcamp faq. Otherwise the 0 values will simply be the placement holders for pending transactions until they are updated with the information via IPNs. So next step, how do we go about receiving IPNs?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s