ctabxl - version 1.2
Correlation tables are widely used in academic research. To make preparing them easy, ctabxl tabulates Pearson and Spearman correlations in Excel. The Excel output can be easily sent to coauthors to communicate findings or copied to Word to produce elegant tables worthy of even the most prestigious academic journals.
Installing and updating
To install the current version of ctabxl, use the following code to search for zach.prof in Stata:
search zach.prof
Next click the link for ctabxl. Then click the link to install ctabxl.
After installation, you can type help ctabxl in Stata to view a comprehensive help file. You can also read the remainder of this webpage for examples that illustrate all of ctabxl's features.
If you have an old version of ctabxl on your computer and want to update to the current version, run the following code in Stata to uninstall the old version:
net uninstall ctabxl
Then run the search command from above and follow the links to install the current version.
You can alternatively use the following net install commands to install any current or non-current version of ctabxl:
net install ctabxl, from("https://raw.githubusercontent.com/zachprof/ctabxl/1.2") // installs version 1.2 (current)
net install ctabxl, from("https://raw.githubusercontent.com/zachprof/ctabxl/1.1") // installs version 1.1
net install ctabxl, from("https://raw.githubusercontent.com/zachprof/ctabxl/1.0") // installs version 1.0
Getting started
To demonstrate ctabxl's functionality, I am going to use the auto.dta dataset. This dataset comes with Stata and allows any Stata user to replicate my examples exactly as they are presented below. You can also access the code in my examples by downloading ctabxl examples.do.
To load auto.dta, type the following two commands in Stata:
clear
sysuse auto
Once auto.dta is loaded, I recommend using the cd command to change your working directory to an easy to find location. The location you specify is where ctabxl will save correlation tables by default. As an example, I might type the following in Stata's command window to change my working directory to my desktop:
cd "C:\Users\zacha\Desktop"
To make a correlation table, ctabxl requires (1) the variables you want in your table in the order you want them tabulated followed by (2) the word "using" and the name of the file you want your table saved in. For example, to learn if larger cars cost more, weigh more, and get worse gas mileage, you might type the following in Stata:
ctabxl headroom length price weight mpg using "Auto"
This will produce a correlation table examining price, weight, gas mileage, and two measures of size (headroom and length) in a file named Auto.xlsx saved in the location specified earlier using the cd command. The correlation table can be opened by navigating in your computer to the location specified using the cd command or by clicking a link that appears in Stata after running ctabxl (the link will say "click here to open Excel output"). If you open the Excel output, the correlation table should look like this:
74 observations are used to calculate these correlations and the table is organized in a very generic way. Pearson correlations are in the bottom triangle, Spearman correlations are in the top triangle, *'s are used to indicate statistical significance at the p-value < 0.05 level, and ones are used to designate the main diagonal.
Before moving on, you can check the accuracy of these results by typing the underlying correlation commands directly in Stata as follows:
pwcorr headroom length price weight mpg, sig obs
spearman headroom length price weight mpg, stats(rho p obs) pw
pwcorr is the command ctabxl uses to identify Pearson correlations, spearman the command ctabxl uses to identify Spearman correlations, and typing the code above produces the outputs necessary to verify the accuracy of observation numbers, correlation coefficients, and levels of statistical significance.
To demonstrate a few of ctabxl's options, note almost all correlations are significant at the 0.05 level. A natural question is whether any are significant at higher levels. To answer this question, try running the following four lines of code and opening the new file it creates named "Auto Diff Sig.xlsx":
ctabxl headroom length price weight mpg using "Auto Diff Sig", sig(0.05) sheetname(C1) tablename(Sig. 0.05)
ctabxl headroom length price weight mpg using "Auto Diff Sig", sig(0.01) sheetname(C2) tablename(Sig. 0.01)
ctabxl headroom length price weight mpg using "Auto Diff Sig", sig(0.001) sheetname(C3) tablename(Sig 0.001)
ctabxl headroom length price weight mpg using "Auto Diff Sig", sig(0.0001) sheetname(C4) tablename(Sig 0.0001)
"Auto Diff Sig.xlsx" should contain four sheets, C1, C2, C3, and C4, with different tables that evaluate statistical significance at the 0.05, 0.01, 0.001, and 0.0001 levels. You'll find that all correlations significant at the 0.05 level are also significant at the 0.01 and 0.001 levels. However, sheets C3 and C4 reveal Pearson correlations between (1) price and length and (2) mpg and headroom are significant at the 0.001 level but not the 0.0001 level.
If it's standard in your field or target journal to use a significance level other than 0.05, you can also use the sig option by itself. However, to avoid creating more new spreadsheets, let's also use the replace option to overwrite the spreadsheet created in my first example. To do this, try the following command:
ctabxl headroom length price weight mpg using "Auto", sig(0.01) replace
This overwrites the Correlations sheet in Auto.xlsx with a new table that evaluates statistical significance at the 0.01 level rather than the 0.05 level.
You can also base statistical significance on Bonferroni and Sidak adjusted p-values using the bonferroni and sidak options. To test these options, try the following code:
ctabxl headroom length price weight mpg using "Auto", sig(0.001) bonferroni replace
This overwrites the Correlations sheet in Auto.xlsx with a new table that evaluates statistical significance at the 0.001 level based on Bonferroni adjusted p-values.
ctabxl headroom length price weight mpg using "Auto", sig(0.001) sidak replace
This overwrites the Correlations sheet in Auto.xlsx with a new table that evaluates statistical significance at the 0.001 level based on Sidak adjusted p-values.
Note the bonferroni and sidak options cannot be used together.
Finally, you might find in a project that two decimal places aren't informative. For example, if correlations are as small as 0.01 or 0.00, you may want to round to three or four decimal places. You can do this with the roundto option in ctabxl as follows:
ctabxl headroom length price weight mpg using "Auto", sig(0.01) roundto(4) replace
This overwrites the Correlations sheet in Auto.xlsx with a new table that evaluates statistical significance at the 0.01 level (unadjusted) and rounds to four rather than two decimal places. You can also round to different decimal places by replacing four with other integers from zero to 26.
This concludes my getting started tutorial. You can continue reading for more examples that demonstrate all of ctabxl's other options. You can also type help ctabxl in Stata to view a comprehensive help file containing all the information you need to put ctabxl to use tabulating beautiful correlation tables in your next paper.
Technical Note: If you want ctabxl to save correlation tables to a location other than your working directory, you can do so by specifying the path and file name together. For example, the following code saves the initial correlation table created in my first example to my downloads folder:
ctabxl headroom length price weight mpg using "C:\Users\zacha\Downloads\Auto"
Applying sample restrictions
You can use if and in as you normally would in any other Stata command to restrict the sample used in calculating correlations. For instance, continuing the example from above, you could type the following in Stata to examine correlations for non-US car manufacturers:
ctabxl headroom length price weight mpg using "Auto" if foreign == 1, sig(0.01) roundto(4) replace
If you open Auto.xlsx, you'll now see the sample restricted to 22 cars in auto.dta manufactured by non-US companies, such as Audi and Subaru.
In changing the sample, an important issue to be aware of is that ctabxl tabulates pairwise correlations by default. This means, within the sample defined by if and in, ctabxl uses all observations with nonmissing values of a pair of variables even if other variables have missing values for the same observation. Thus far, the distinction between pairwise and nonpairwise correlations has been moot because headroom, length, price, weight, and mpg are nonmissing for every observation in the sample. However, this isn't the case for rep78. To see what happens when missing values are introduced, type the following in Stata:
ctabxl headroom length price weight mpg rep78 using "Auto" if foreign == 1, sig(0.01) roundto(4) replace
rep78 is missing for one observation within the subsample of non-US car manufacturers. Therefore, you'll see that instead of reporting "Number of observations: 22 (same for every correlation coefficient)", ctabxl outputs a second table to Excel reporting the sample size for each correlation coefficient as follows:
This table indicates that the full sample of 22 cars manufactured by foreign companies is used to calculate every correlation coefficient except those involving rep78. In contrast, correlations involving rep78 are calculated using only the 21 observations where rep78 is not missing.
It is typical to calculate correlations this way for academic papers because, if a variable used in a supplementary test is missing for part of the sample, you generally want to calculate correlations for the full sample if possible. However, if you want nonpairwise correlations, you can calculate them with the nopw option as follows:
ctabxl headroom length price weight mpg rep78 using "Auto" if foreign == 1, sig(0.01) roundto(4) nopw replace
If you run this code and open Auto.xlsx, you'll notice the sample size table is omitted because all correlations are calculated using the 21 observations where rep78 is not missing, even for correlations not involving rep78.
Finally, in some datasets, you might find variables set to zero for a large number of observations. In these cases, it may be useful to evaluate nonzero values only. This can be done with the nozeros option in ctabxl. To demonstrate, let's first set some observations in auto.dta to zero with the following command:
replace headroom = 0 if _n <= 10
This sets the variable headroom equal to zero for the first ten observations in the dataset. Since these observations are all related to US car manufacturers, we'll have to remove the foreign == 1 restriction and, while we're at it, let's also remove the sig, roundto, and nopw options, and rep78 since they're not relevant to this example. You can now use the nozeros option as follows to produce a correlation table that excludes the ten observations where headroom is set to zero:
ctabxl headroom length price weight mpg using "Auto", nozeros replace
I've now shown you all the tools available for restricting the sample in ctabxl. Remember, you can always type help ctabxl in Stata if you need a refresher or would like to review ctabxl's other functions on your own. If you continue reading, the next section covers additional formatting options not covered in the getting started tutorial.
Additional formatting options
Many published articles use ctabxl's standard formatting of Pearson correlations in the bottom triangle, Spearman correlations in the top triangle, *'s to indicate statistical significance, and ones to designate the main diagonal. However, you may want to format your correlation table differently depending on your target journal or field of study. Therefore, I've included options in ctabxl that allow you to format correlation tables just about any way you want.
Before reading about these options, it's a good idea to reload auto.dta since you may have changed the dataset in the previous example. You can do so by typing the following two commands in Stata:
clear
sysuse auto
Perhaps the most common deviation from ctabxl's standard formatting is to use boldfaced and italicized text in combination or separately with or without stars to indicate statistical significance. Alternative formats such as these can be achieved using the bold, italic, and nostars options. The following code illustrates:
ctabxl headroom length price weight mpg using "Auto", bold replace
ctabxl headroom length price weight mpg using "Auto", italic replace
ctabxl headroom length price weight mpg using "Auto", bold italic nostars replace
ctabxl headroom length price weight mpg using "Auto", bold nostars replace
If you run each line of code and open Auto.xlsx before running the next line, you'll see first that specifying the bold option by itself results in boldfaced text with one star. Next, specifying the italic option by itself results in italicized text with one star. Third, specifying the bold, italic, and nostars options together results in boldfaced and italicized text with no stars. Last, specifying the bold and nostars options together results in boldfaced text with no stars. Finally, it is worth noting that you can use the bold, italic, and nostars options in any combination, not just those suggested above. Examples of every possible combination are provided in ctabxl examples.do.
The next major formatting change you may be interested in is changing which correlations are reported and in which location. This can be done with the pearsononly, spearmanonly, and pearsonupper options. Unlike the bold, italic, and nostars options, the pearsononly, spearmanonly, and pearsonupper options cannot be used in combination. pearsononly results in Pearson correlations in the bottom triangle, leaving the upper triangle empty. spearmanonly results in Spearman correlations in the bottom triangle, leaving the upper triangle empty. pearsonupper results in Spearman correlations in the bottom triangle and Pearson correlations in the upper triangle. Thus, the pearsononly, spearmanonly, and pearsonupper options allow you to put Pearson and Spearman correlations wherever you might typically want them in a research paper. To see these options in action, you can type the following lines of code one at a time in Stata:
ctabxl headroom length price weight mpg using "Auto", pearsononly replace
ctabxl headroom length price weight mpg using "Auto", spearmanonly replace
ctabxl headroom length price weight mpg using "Auto", pearsonupper replace
Another major formatting option available in ctabxl is noones, which removes ones from the main diagonal. You can test this option with the following code:
ctabxl headroom length price weight mpg using "Auto", noones replace
If you open Auto.xlsx after running this code, you'll find the main diagonal of the correlation matrix is now blank.
The last major formatting option available in ctabxl is 3stars, which allows you to use one, two, and three stars to indicate different levels of statistical significance. This option cannot be used in combination with the bold, italic, or nostars options, and it requires you to specify three levels of significance in any order. The largest level will automatically be assigned one star, the smallest level will automatically be assigned three stars, and the other level will automatically be assigned two stars, no matter what order you enter them in. To demonstrate, try typing the following command in Stata:
ctabxl headroom length price weight mpg using "Auto", noones 3stars(0.001 0.01 0.0001) replace
At this point, we've covered almost all of ctabxl's options. As always, remember to type help ctabxl in Stata any time you need a refresher. The last section covers how to add extra rows and columns to your correlation tables, which can be helpful for copying tables from Excel to Word.
Adding extra rows and columns
The last two options available in ctabxl are extrarows and extracols. These options aren't useful if you're tabulating correlations in Excel to send to coauthors. However, they can really come in handy if you're trying to copy correlations from Excel to Word. To illustrate, consider the following correlation table prepared in Word:
To get the spacing right, I added one extra row and one extra column between every correlation coefficient. Therefore, you would need to manually add extra rows and columns in Excel, or paste each correlation coefficient one by one to avoid significant formatting issues in Word. To more easily copy correlations from Excel to Word, you could alternatively use the extrarows and extracols options as follows:
ctabxl price weight mpg using "Auto", nostars bold italic extrarows(1) extracols(1) noones replace
After running this code, you would want to open Auto.xlsx and copy the correlation coefficients as shown here:
Finally, to populate the table in Word, you could open the Word document and use the "Overwrite Cells" paste option as shown here:
The formatting looks a little funny in the image above because I haven't actually clicked the "Overwrite Cells" button. However, after clicking "Overwrite Cells", the table should go right back to the original spacing shown at the top of this section. To try for yourself, download a blank correlation table here: correlation table.docx.
If you ever decide to use more than one extra row or column in a project, keep in mind the extrarows and extracols options will accept any integer from one to ten.
In addition, note that when you use the "Overwrite Cells" paste option the font in Excel is transferred to Word. This is critical when using boldfaced and/or italicized font to indicate statistical significance. However, if you are not using boldfaced or italicized font, it may be preferable to use the "Keep Text Only" paste option.
If you've read every section on this page, give yourself a pat on the back. You now know how to use all of ctabxl's options. If you ever forget anything you learned here or need to quickly reference something, you can always type help ctabxl in Stata.
Happy coding!
Acknowledgements: I owe a special thanks to Derek Christensen and Clay Partridge for carefully reviewing and providing helpful feedback on version 1.0 of this code, the complementary help file, and the documentation on this page. I also thank Derek Christensen for suggesting the replace option and Svenja Dube for suggesting the 3stars option, both of which have also been incorporated in dtabxl.
Code access: The files underlying every version of this code as well as information about what changed from one version to the next are readily accessible on GitHub.