The USDA recently released its annual report on child-rearing expenditures, and this has led to the following chart (from the report) being reproduced in various news and blog articles:

Screen Shot 2014-08-19 at 12.46.05 PM

This use of pie charts is offensive to my senses, but not in the way that comma splices make me twitch.  In this case (unlike with most comma splices), it actually matters.  Case in point:  Not long after I reposted one such article to my Facebook, someone commented that the article’s author should have pointed out that the differences were due to women entering the workforce (note: the USDA report does cite this as a reason for the increase in the cost of education and childcare).  And, because of the misuse of pie charts, I could see how a casual observer might think that is the ONLY notable change in costs in the last 53 years.  

Alas, no.  Why not?  Because the pies are not the same size.  So percentages of the pie are nearly meaningless to compare to one another.

Why do you almost never want a pie chart?  The only information that pie charts give you is proportions of a whole.  So they should only be used if (1) the total size of the pools being compared are the same, or (2) all you care about is proportion and the pool size doesn’t matter.  

Displaying the same data as above as a bar chart yields the following:


This chart makes it much easier to see the differences in all budget categories in real (adjusted) dollars between 1960 and 2013 expenditures.  Note that while the housing category was roughly equal in the pie charts, there is actually a notable increase between 1960 and 2013 housing costs.  This is because 30-31% of ~$200K is quite a bit less than 30-31% of ~$248K.  This is what I mean when I say that the pies are different sizes.  It’s like giving one person 1/3 of a medium pizza, giving someone else 1/3 of a large pizza, and then just saying that you gave each person 1/3 of a pizza when asked how much food you supplied.    

But wait, you might say, what if I care about proportions as well as real differences?  Well, there is a chart for that too:


Now you can see the difference in sizes between each of the categories (as well as the difference in totals), in addition to differences in proportions (i.e., each category relative to its total height).   Alternatively, you can use a bar chart to display raw differences and a pie chart (or stacked bar chart by percentage, my preference to pies) when discussing proportions.

In this case, personally, I don’t really see how proportion is particularly relevant.  How much is spent relative to the budgetary whole doesn’t yield much useful information beyond that given by raw differences (while leaving out the important info that raw differences give).  Moreover, displaying the data ONLY in terms of proportion is downright (though probably unintentionally) misleading.  For example, if one category were to change in real expenditures between time points and all others remain equal, then ALL proportions would change, which would make it seem like things have changed much more drastically than they have.  Let’s say that all that had changed between 1960 and the present were the drastic increase in childcare costs that are presumably due mostly to women working outside the home rather than as unpaid childcare workers.  Our bar chart would tell us this accurately:


Pie charts would tell a different (misleading) tale entirely:


Similarly, if something DOES change in real dollars but remains a roughly equal share of a whole that otherwise changes quite a bit (as in the case of our actual USDA data with housing costs), displaying in a pie chart obscures that change.

Moral of the story?  Choose your charts wisely.  There are lots of guidelines out there for visual display of quantitative information.  And when in doubt, avoid the pie.

So, like any graduate student, I often get asked the question, “When are you going to be done?” And like any graduate student, I become less enamored with the question every time I hear it.

But! You say. Surely you don’t mind now that you REALLY. ARE. DONE?!

Er, I’m not done.  I mean, I’m done, but I’m not done.  I mean… graduate school doesn’t end; it tapers off.

I’ll show you what I mean.

Read the rest of this entry »

I think I’ve realized why LiveJournal and similar types of blogging* are dying, even for those for whom it was once practically a life necessity.  It has to do with this research (of which I have been familiar for some time–Pennebaker came and gave a talk here–but it came up in a recent colloq again and so it has been rattling around in my brain lately).

Basically, the effect I’m thinking about is this:  “Both projects found that with increasing age, individuals use more positive and fewer negative affect words, use fewer self-references, use more future-tense and fewer past-tense verbs, and demonstrate a general pattern of increasing cognitive complexity.

My argument hinges on the idea that LiveJournal is out of fashion, so it doesn’t get many new, young users (in the US/English-speaking world).  New, young users hop on Twitter or Facebook or Pinterist or whatever is cool today (nearly all of which involve massive numbers of short and/or multimedia postings rather than longer, textual entries).  So, the vast majority of LiveJournal users are aging out of the kind of communication that is best suited to the form.

So let’s break it down—with increasing age, individuals…

Image– Use more positive and fewer negative affect words: We all know that LiveJournal is (or was) for whining.  Very few people post long entries about how well their life is going—and if they do, it probably doesn’t make for very interesting reading unless it involves some unusual circumstances (e.g., “Here’s the story of how I won $HUGE_SUM in Vegas!”).  There is no story to tell when things are just good for no particular reason, or because nothing bad has happened recently. But if the above research is true… as you get older, you tend to whine less (and maybe count your proverbial blessings more).  So you feel less need to hop on th’old El Jay to gin up some sympathy for your latest bad day.


– Use fewer self-references:  This is probably the biggest one.  You talk about yourself less as you get older.  But these barely-Web-2.0 blogs* were the very beginning of self-centered social networking, and once you start talking about something else other than the microcosm of your ultimately mundane existence, your journal has become something else (i.e., a blog that someone other than your closest friends might actually want to read).  The vast majority of my former LJ friends who are still blogging have established a blog “topic” or are involved in hobby/fandom/etc. communities.  That is, we’re not 30-to-40-plus and still writing, in excruciating detail, about everything we did this weekend… we don’t feel the urge to talk about ourselves in great detail, to everyone who will listen, multiple times per day.

You might point out that many people in this age range have merely moved their social-network activity to another milieu, like Facebook.  True… but I would be highly interested in seeing a Pennebaker-style analysis of FB posts by age (which I’m sure has been done, many many many times, but I’m already procrastinating by writing this post in the first place).  My guess is that FB use differs across the lifespan.  Anecdotally, most people I know in my age cohort use FB to keep friends and family up to date on our kids more than ourselves.  My childless friends tend to fall off the social-networking grid for long periods of time, unless they have topical information to share.  With the Presidential election coming up, there are a lot of people talking about politics… but not about themselves.

Image– Use more future-tense and fewer past-tense verbs:  Here’s a case where Twitter/Facebook is much more suited to “older” than “younger” communication—When speaking in the future tense, “We’re going to Disneyworld next month!” is pretty much all there is to say.  Nobody wants to know your day-by-day itinerary.  Past tense is for telling stories, and stories require a longer form than those currently in vogue. So our more future-oriented language style finds a natural home on Facebook/Twitter; with fewer past-tense (and self-referential) stories to tell, we don’t miss the characters or feel a need to wander back to LiveJournal.

– Demonstrate a general pattern of increasing complexity: …I think this one is why people who were once avid LiveJournalers tend to lament its passing from time to time, especially those who have not found a good replacement outlet for thoughts more complex than a tweet.  But in our current social and economic climate, once you’re old enough to have “real” responsibilities, it’s also difficult to find time for such expressions.  Of course, even as our thoughts become more complex and more potentially interesting to other people… we’re less compelled to broadcast them to the world.

* By this, I mean LiveJournal or similar long-format, highly-socially-networked blogging.  This is not to be confused with either microblogging (e.g., Twitter, Facebook) or blogs that are more “broadcast” style, where people subscribe to you because they want to read articles on a certain topic or from a certain perspective (rather than because they want to interact with you, personally, in quasi-real-time and keep up with the minutiae of your everyday life). The aforementioned type is hereafter referred to as “LiveJournal,” much as facial tissues are often referred to as “Kleenex” regardless of brand.

** Shouldn’t we be on, like, Web 5.7 or something by now?

Dissertation overview

Edit:  This is a little better.

Here is a PDF.

Genes and Environment

January 27, 2012

In my Biology and Cognitive Science of Communication course last year, one of the #1 themes–one of perhaps 5 ideas that I really hoped students would understand by the end of the course and take away with them–is the notion that genes/innate biology and environment have, for the most part, mutual effects on outcomes (including/especially behavior) that cannot be parceled out in to X% and Y%.  That is, the interaction between genes and environment is usually much more important than either by itself.

Of course, this is not always true.  There are extreme versions of both–single-gene traits (e.g., color blindness, Huntington’s Disease, many others) and trisomies (e.g., Down Syndrome) obviously have a very large, direct effect on outcomes.  Likewise, extreme environmental circumstances (e.g., Fetal Alcohol Syndrome, head trauma, sensory deprivation) can have large, direct effects.

But for the most part, it’s an interaction–and that’s an easy, glib line to memorize, but I think it took students awhile (and, for that matter, it took me a long time) to really grasp what that means.  But this morning I was perusing my new* results and I came across a good example.  Below the cut is copy pasta of one of my results, “Response to Diet.”  This result reports on 3 SNPs (essentially, genes) that research shows to be associated with the link between diet and obesity.

Now, calls these “preliminary results” because each link only has one approved study that goes with it, and there are a host of statistical issues with the way that much research is done on gene effects (basically, it’s correlational data mining, so unless you have MANY studies showing the same association repeatedly, there’s always the possibility that a given result is a chance fluke rather than an actual relationship).  But let’s pretend for a moment that they’re associations that can be taken at face value.

Each of these genes has 3 known variants (this is not the only way it can happen, but it’s usual for the SNPs that 23andme reports).  Scientists doing gene studies compute odds of whatever outcome they’re interested in (e.g., obesity, Parkinson’s Disease, etc.), comparing the less usual variants (i.e., mutations) to the most common variant (i.e., “normal”).  Sometimes the mutation seems to be protective–it does better than the most common variant (e.g., it is correlated with lower odds of heart disease than the general population), but often the mutation is associated with higher odds of things you don’t want.

The results below the cut are interesting because they specifically relate to response to diet–that is, how my body interacts with the environment. Thus, these aren’t “obesity” genes in the simplistic sense–having a given variant causes obesity–but they are (perhaps) genes that determine how I metabolize different kinds of foods, and thus what kinds of foods are likely to help me lose weight or make me gain weight.**

So the genes themselves are not causing weight loss/gain.  Neither is my diet alone.  You can’t say it’s 30% genes and 70% environment.  It’s genes AND diet (environment).  The effects of the genes are conditional upon what food I eat, and the effect of food I eat is conditional upon my genes.  Their effects cannot be separated.

I think this is also interesting to interpret the wide variety of fad diets out there, and how so many people can swear that a given fad diet is THE way to lose weight because it worked very, very well for them.  How can Atkins work for some people, and low-fat diets work for other people?  I suggest that those people have different genes like the ones below (assuming this research pans out).  This isn’t even that earth-shattering of an idea–maybe some people store carbs more easily than others, whereas some people store fats more easily than others.  Human metabolism is ridiculously complex–you learn the Krebs cycle in high-school or freshman Bio and think that’s complex?  That’s just baby stuff, and really scientists have only scratched the surface of figuring out how we do the biochemical wonders that we do.***  It’s no surprise, to me at least, that the wide range of individual differences in diet efficacy could have some underlying genetic cause.

And my particular results affirm why I have personally had success with low-fat diets and not with low-carb diets (that, and I’m also a carrier for MCAD deficiency–which apparently even carriers have lower-than-typical MCAD levels, so eating very few carbs tends to do things like make me faint).

* If you are not familiar, is a company that, for a fee and a container of spit, tells you about your genes.  They only provide results that are based on some baseline level of expert-vetted (as well as peer-reviewed, published) research.  For that reason, though their information about my genes doesn’t change, they periodically update what given results MEAN based on new research.  You can also download the full result that lists every gene that they genotyped, which despite the fact that it is not a full sequence, is 8MB of text–that’s a lot.

** I phrase things this way because I am currently, um, larger than I would prefer to be–curse you, baby weight that never left!!–but the reverse is obviously also true if you are one of those folks who have trouble keeping weight on.

*** You know what’s scary, that I did not know prior to being married to someone in Big Pharma?  A great many drugs on the market have unknown or only theorized mechanisms.  They don’t really know how they work.  Drugs are developed by testing a bunch of molecules (usually, ones that have something in common with another drug/compound that is already known to work) to see which ones are biologically active, and out of those which ones do the things that we want with the fewest side effects (in a very simplistic nutshell).  It’s very much “throw stuff, see what sticks.”

[See below for my 23andme “Response to Diet” results]

Read the rest of this entry »

APA style

November 22, 2011

In a hypothetical reality where I have the kind of free time to devote to nerdy sidequests personal crusades, I would put serious effort into reforming APA formatting guidelines.

APA 6 already made some good headway in this area, especially in the reformatting of levels of heading.  I know that people who had already become set in their ways didn’t like that, because everyone hates memorizing new arbitrary rules of style.  But aside from recent modifications, APA formatting guidelines exist because of typewriters.  Before APA 6, you had a complicated set of headings that involved ALL CAPS and italics (formerly underline) because caps and underline were all you could do on a typewriter.  Now we have fancy-schmancy things like bold, so headings can actually stand out from the body text.  Yay!

But lots of things remain from the old typewriter days.  Does anyone know why we have a running head and define it on the first page?  It’s because in the old days scholars would send in typewritten manuscripts with a title page identifying author info.  The editor pulls off the title page and sends the anonymous manuscript out to reviewers (by mail!), then gets it back with actual red marks on it and needs a way to match it back up with identifying information.  I know there are still a couple Luddite journals out there that take paper submissions (I don’t know how many of them use APA style, however)–but by and large this is a completely unnecessary practice that continues out of tradition.  

But who cares–how hard is it to just type up a running head? No big.  What is perhaps more significant is the resistance of APA to put figures and graphs inline with the text.  Really? We still need to attach them to the end?  This practice dates to when you would have attached ACTUAL PHOTOGRAPHS, hand-drawn graphs and figures, output from a “supercomputer,” etc. that you needed to attach to your typewritten manuscript (and that would be captured to print in the journal with a camera… with film).  Now it’s actually (arguably) easier to put these things inline than it is to attach them to the end of the manuscript–especially if you’re using reference-management software or automatic endnotes that seem to think it knows better than you where things should be placed, making it difficult to put things in a single document AFTER the refs or endnotes.  Moreover, it is a LOT easier for reviewers to check your tables and figs when they are presented in with the text than when they are at the end of the document, especially when they are reviewing docs on a computer or tablet (as is increasingly the case).  

So yeah… this is my dream, this is my quest.  If I had the kind of time that Don Quixote did, anyway.

I’m doing a poor job at keeping up my blog (funny how teaching 2 classes and working on a dissertation will do that) but I wanted to share this link and this link to pharma chemist Derek Lowe talking about quack science and medicine, because it’s pretty much exactly what I would say. Especially the stuff about the quest for “natural.”

Natural isn’t necessarily good for you. Lots of natural things are poisonous. They don’t want to be eaten/ingested.

Also, a big point that Derek mentions just briefly but that I think is really important is the (false) idea that we are, by default, perfect and healthy. Modern medical interventions, “lifestyle” problems, bad upbringing, or other human mistakes are required in order for poor mental or physical health to result, and we would all be happy, well-adjusted, and cancer- and disease-free if we would shun the evil ways of modern science/industry and get back to living/eating/bathing/parenting the way that Nature intended.

There’s this related idea out there that it is possible to optimize. That there is some ‘perfect’ way to live, and if you can just find that perfect way, you will have perfect health and well-being. I actually think this kind of thinking comes about from living in a technological society surrounded by things that are designed to function in a particular way, and if you use them correctly, they’re not supposed to break (and if they do, you can collect on that warranty and get a new one). But our bodies aren’t warrantied. They’re not even designed to work properly. They’re designed (by natural selection) to work WELL ENOUGH, MOST OF THE TIME, to have babies that survive to do the same thing. That’s it.

And nature can only work with what it’s given. It doesn’t design out of whole cloth. We’re really just a set of kludgy hacks–unlike in software development, nature can’t say “Screw backwards compatibility–we’re going to do a complete architectural redesign with the next version, because that will let us have better features and performance!” Nope, you get whatever features and performance are possible within the constraints of the current system. Slowly. Incrementally. And non-intelligently (so even if a better system is THEORETICALLY possible given the current system, if the cards don’t end up falling that way, it doesn’t happen).

So there aren’t guaranteed to be “natural” solutions to any of your problems… in fact, there aren’t guaranteed to be medical solutions (another weird thing I see a lot–anger if there is nothing a doctor can do for someone, even in non-life-threatening circumstances, because “they should be able to fix you”).

Personally, I can’t wait for cyborg bodies.

Right now there’s a bit of a storm in a teacup about a bill introduced in Tennessee–original home of the Scopes Monkey Trial–that putatively would protect teachers from being sanctioned if they refuse to teach evolution, or if they teach intelligent design, or if they don’t worship the Flying Spaghetti Monster.

On one hand, after reading the full text of the bill (it’s very short), I’m a little confused. The bill states that educators

“shall endeavor to create an environment … that encourages students to explore scientific questions, learn about scientific evidence, develop critical thinking skills, and respond appropriately and respectfully to differences of opinion about controversial issues”

and that

“teachers shall be permitted to help students understand, analyze, critique, and review in an objective manner the scientific strengths and scientific weaknesses of existing scientific theories covered in the course being taught,”

while explicitly clarifying that these protections do not apply to religious ideas (as if the number of times the word “scientific” is repeated does not make that clear).

So I’m having a hard time seeing what the fuss is about. Most stories seem to be downright misinterpreting it. One editorial concedes that the actual language of the bill refers to protecting science education and not religion–but then goes on to suggest that this is doublespeak that somehow contains a hidden loophole, and the intent is really “to replace scientific principle with religious ideology.” (Moreover, I’m not sure how a bill that encourages critical teaching and protects teachers from sanction for encouraging discussion–essentially, protecting their speech–is “an attack … on First Amendment guarantees of speech and religious freedoms.” I don’t know how this would be true even if this bill really DID explicitly allow/encourage teaching “alternative theories” to evolution, unless it required doing so.)

On the other hand, there is something really powerful about false controversy, and this bill explicitly uses evolution and global warming as examples of “controversial” science topics. Both happen to be excellent examples of false, intentionally manufactured scientific controversy where, in actuality, no scientific controversy and only political controversy exists.* This bill reifies these topics’ “controversial” label. In fact, what makes me skeptical about this bill is that (as many point out in these hand-wringing articles) if it is being honest about its intent, the bill is unnecessary. But if it is being disingenuous, I want to know what kind of voodoo turns

“This section only protects the teaching of scientific information, and shall not be construed to promote any religious or non-religious doctrine, promote discrimination for or against a particular set of religious beliefs or non-beliefs, or promote discrimination for or against religion or non-religion”

into something that promotes “an atmosphere in which myths and fairy tales would prevail over science.” I mean, I know I’m not a lawyer and all, but that is some truly amazing doublespeak.



* If you think I’m sounding like a paranoid conspiracy theorist here, read Frank Luntz’s leaked 2002 memo to G.W. Bush, especially pp. 137-143. Apparently Luntz has changed his tune, because he has started using his opinion-creation powers for good rather than evil.


October 14, 2010

I haven’t updated here in awhile, and I really do have things that I want to say. Trouble is, I’m supposed to be working on my dissertation proposal. Writer’s block is a drag, and I’m in arguably the most epic existential-crisis-producing stage of the academic doctorate. What do I do to mitigate the effects of an existential crisis? Surf the internet until my eyeballs fall out.

While engaging in such rampant time murder, I came across a brief online article that repeatedly included sentences of the following form: “There are multiple companies out there, which secretly exploit starving puppies in third-world nations.”* So.


That: I have some pants that are too big for me.

The that clause is called a restrictive clause. It modifies the preceding noun (in this case, pants) to specify certain boundary conditions (restrictions) that are important to the sentence’s meaning. Without this that clause, I just have some pants–which is true, but is not the point of this sentence.

Which: I gave some of my pants, which were too big for me, to Goodwill.

In this sentence, the point is that I gave some of my pants to Goodwill. It also happens that all of these pants that I gave to Goodwill were too big for me. For this reason, the “too big for me” clause is nonrestrictive. Nonrestrictive clauses are a type of nonessential element–a part of the sentence that you could leave out without changing the meaning of the sentence. For this reason, it is set off with commas.

The thing is, unless you’re a stickler prescriptivist, it doesn’t actually matter which word you use, because lots of very good professional writers just always use “which.” What does matter, in terms of meaning and understandability, is your comma usage. Unfortunately, that’s where MS Word’s too-smart-for-its-britches grammar check trips people up. I imagine the online article I came across was a victim of Word, which tells you to put a comma before “which” even if you’re using the clause restrictively. So to go back to my fake example, because it is set off with commas, the starving-puppies clause is nonrestrictive–in other words, that sentence means that all companies secretly exploit starving puppies in third-world nations.

Bottom line:

If you’re interested in using that/which “correctly”: use that where you don’t set off its clause with commas (i.e., in essential clauses), and which where you do (i.e., in nonessential clauses). If you have reasonably good comma usage, this rule of thumb will not fail you.

If you’re not that confident about your comma usage, but Word tells you that you need a comma before your “which,” don’t just blindly add the comma. Only set a phrase off with commas if it is giving you “extra” information about something else in the sentence.

* This is not an actual sentence from the article. Obviously.

Quick link…

August 13, 2010

This Jezebel post* asking whether students are crazier than ever includes a similar argument to my “Are students dumber than ever” post — yes and no. Yes, they’re more likely to have severe mental illness. No, it’s probably not caused by the stress of college life; it’s probably because more people, including people with mental illness, are getting access to higher education.

* [Warning: Jezebel uses f-bombs.]