I’m a big fan of R, and it will be my primary tool for a long time, but I wanted to add another tool to my toolbox and decided on Stata. Stata 13 was just released (June 2013), and I have to say that it’s a very nice package.
Why would anyone pick Stata over R? R has many advantages, but here are some reasons that you might pick Stata:
- You’re in a field or organization where Stata is widely used.
- You want a GUI. (Stata is excellent because it has a command line and a GUI, and it will show you the command line equivalents of your GUI choices so that you can learn the commands gradually or you can store and reuse the commands to automate tasks. The command line is considerably faster than any GUI could be.)
- You want to give commands, with the possibility of some automation, but aren’t really comfortable in a full-blown programming environment.
- You’ve been using Excel, and want a tool that reads Excel files and holds its data in a row/column structure, like a spreadsheet. (Like any real statistics program, it’s different from Excel, but it makes the leap as small as possible.)
- You want tons of documentation, and commercial support.
- You are using certain techniques where Stata is more advanced than R. (I believe I’ve read that Stata has some panel data capabilities that go beyond what R and its current packages can do, but can’t find the quote right now.)
Stata is much more consistent and seamless than SAS, SPSS, et al, and it’s reasonably priced with good options — especially if you’re a student. (Currently, they’re on a release schedule where they have a new version every other summer, with minor (free) updates in between. If you get a perpetual license, you can use that version and its updates forever.)
If you want something Stata-like that’s free, you can check out the gretl project, but Stata is superior to gretl in my opinion. Especially if you’re a student, I really think the perpetual license for $200 (Stata 13/IC) is a great deal: you can use the version you purchase throughout school and beyond. (You’ll be stuck with that version when the next version comes out, but if you buy now (summer 2013), that’s two years away.)
We have rolled out Stata 14 as a remote desktop service and would like to hear about your user experience – especially accessing the software from MAC, iOS or Android. 30day free trial available for testing here: https://www.apponfly.com/en/application/stata?KAI
Try, also, SPSS and NCSS if you like to compare statistical softwares. Those two are, also, really good.
And, btw, there is Stata 14 free trial in cloud, if smb needs it https://www.apponfly.com/en/application/stata?KAI=
Dear Mr. Folta!
Thank you so much for all your advice and kind wishes. Greatly appreciated. I follow you on Twitter. I also would like to connect with you on LinkedIn, but didn’t want to send a request until you let me know if you’d like it, too.
Aleksandr (Alex) B.
Please feel free to connect on LinkedIn.
I appreciate your desire to help. My field of study is Information Systems (Ph.D. program). Currently, after my dissertation proposal has recently been approved, I am at a stage prior to data collection. My planned research involves statistical data analysis, mainly structural equation modeling (SEM). I would describe my level of expertise in statistics and SEM as beginner. Currently I am evaluating several software products that would allow me to successfully perform all needed analysis. I shortlisted the options to Stata 13, STATISTICA 12, SmartPLS and WarpPLS. Due to various reasons, I prefer not to deal with R for my dissertation, despite my awareness of its power, flexibility and existence of SEM packages (unless there are extremely compelling arguments for using R). I downloaded SmartPLS (free) as well as STATISTICA 12 and WarpPLS (both trial versions). I submitted a request for a trial/eval. version of Stata 13, but haven’t heard from them yet. I am not sure if you have had experience with STATISTICA or these SEM-PLS programs, but I am interested in your opinion on using Stata 13 for my research study. I need both general statistical analysis and SEM functionality, so I very much doubt that SmartPLS and WarpPLS would satisfy my requirements, but I prefer to play with them a little to get an idea of their power and functional range. This leaves me with a dilemma between STATISTICA and Stata, a choice which I was hoping you would help me to make. Maybe you would have other recommendations. From reading their product descriptions, it seems to me that these two products are quite similar (including rich GUI and SEM support), but each has at least one advantage: STATISTICA has R integration and APIs support (just in case) and Stata has synchronization between GUI and command line. Sorry for the long message, I hope I didn’t steal too much of your time. I look forward to hearing from you.
P.S. My PC is not very powerful: it’s a 2.2GHz i3 laptop with 4GB (3.85GB usable, so not sure if 64-bit software would be beneficial), under Windows 7.
It’s my impression that Statistica is more graphically oriented (drag nodes, draw lines between them), though I’ve only looked at it briefly. In my mind, it’s more of a data mining tool than a statistics tool, if that distinction makes any sense to you.
I’d say that Stata’s SEM implementation is quite nice, including straight-forward SEM and also a generalized SEM (GSEM) which can do some nice things. You can create a SEM through a GUI (drawing boxes and lines), which creates a textual specification, but I can’t find a way to go from a textual specification to a GUI representation so it seems that you have to start with the GUI to use the GUI.
In fact, Stata 13.1 was just released, and it adds a nice GSEM feature: “gsem’s family(gaussian) option now has suboptions for censoring when link(identity) is used.”
I’d lean leaI towards Stata — other things being equal — and I hope you can get a demo version. It has a full range of standard statistical tools, attractive graphs, a nice SEM/GSEM implementation, and its data layout is very similar to a (single-page, single-sheet) spreadsheet, if you’re comfortable with spreadsheets. (You don’t put formulas into the spreadsheet, just data.) And you can do most anything through menus, with the resulting commands showing up in the log, where you can copy/paste/reuse them.
For several years now, Stata’s been on an every-two-year upgrade cycle and Stata 13 just came out in July, so if you buy it now, it’ll be up to date at no cost through June of 2015.
From my experience, Stata’s pretty efficient with your computer’s resources, though I’m running on a laptop that’s fairly fast (16 GB i7 Mac). Still, it seems quite efficient to me. Right now, I have a small dataset that’s 74 rows by about 30 columns, and Stata’s using 180 MB of RAM, which puts it in 7th place, behind the Finder, Mail, a text editor, and Safari. (For comparison, something as simple as Dropbox takes 33 MB.)
How large will your data sets be, and are there any particulars about your data like needing a live feed, or tying in to other programs, databases, etc? (Stata has an interface to Java.)
Thank you for a fast and comprehensive reply. Interestingly enough, I received e-mail message from StataCorp with license and activation key just minutes after my previous message to you. They sent me the 30-days license for Stata/MP 13.1 (2 cores), which I am not very happy about. I’d prefer to evaluate performance using the version I plan to use, and, since I can’t afford to purchase a full MP version (even for student pricing, it’s prohibitively expensive to purchase and not available for renting), it might be a little confusing. However, I can get an idea about non-MP version’s expected performance by reversing StatCorp’s statement that MP version on 2 cores is 40% faster overall and 72% faster in estimation calculations.
I installed the software, registered it online, updated online to the latest version and played with its interface briefly. My very initial impression is as follows: speed – excellent, auto-update – excellent, GUI – good (somewhat unusual, but reminiscent of good old Delphi IDE), menu layout – OK (I didn’t like 3-4 levels of sub-menus), documentation – excellent (i.e. only SEM reference manual is 581 pages long!)
Based on the Stata versions’ limitations, I believe I can even get away fine even with Small Stata, as I expect my datasets to have much less than 99 variables and 1200 observations. However, I don’t fully understand their phrase “The number of observations is limited only by the amount of RAM in your computer” (said in respect to Stata/IC, but seems to apply to Small Stata as well). Should I be concerned with my PC having just 3.85GB usable RAM? (As of writing my PC’s free memory size is hovering around 400MB (between 385MB with Stata running [no data] and 440MB without). I’m not sure how agressively Stata uses memory for SEM calculations. But I can try to find it out. Or simply close my dear Mozilla Firefox browser…
My data sources are open (my research is on success factors of open source software development), but it’s still not clear what would be the optimal way to access and process the data prior to the analysis. I am considering two options: 1) manual extraction of data from all sources with subsequent merging on my PC in a local database; 2) writing a script or series of scripts that would automagically extract data, merge it on the fly and then feed to the stat. software. That’s why I’m still slightly interested in R (and, thus, Statistica’s integration with R). All this is still pretty fuzzy. I am waiting to receive replies from several people affiliated with data sources with some details. In the meantime I’m evaluating the software and thinking about the optimal strategy for data collection. By the way, do you know if Stata supports splitting a dataset into two for pilot and main stat./SEM analysis?
That’s all for now. Thank you for your time. I look forward to hear from you.
P.S. In fairness to Statistica, I need to say that it seems to me (based on product info and a brief evaluation) that Statistica has a comprehensive set of general stat., SEM and other advanced features (like data mining you mentioned, SPC, QC, etc., depending on the edition) and GUI is very nice, but its stat./SEM features maybe are not as comprehensive and well documented as the ones in Stata. In addition, while their student rental pricing is excellent, it seems that their regular pricing is not (in comparison with Stata). Also, Statistica Ultimate Academic Bundle (student version) is not available for purchase, while Stata provides this possibility.
I bought the Stata/IC 13, which is a perpetual license so it will work for years. (I believe they also have a one-year rental price, but I’d rather be able to use a useful piece of software for as long as I want.) Plus, it’ll be upgraded for free until version 14 comes out, which is about two years. It’s a great deal for students. I don’t think it’s worth the risk to get the Small version since it’s hard to tell when you’ll surpass one of the limitations.
You should be able to load a set of data that’s roughly the size of the data you intend to use, and see how much RAM Stata uses. I’m pretty sure that all versions will use memory the same way, the higher versions simply allow you to use larger datasets and multiple CPUs.
R is the most flexible option overall, of course, and it also has a couple of SEM options. I like the package ‘lava an’, which has a nicer syntax for creating SEMs. You should check it out (http://www.jstatsoft.org/v48/i02/paper). R mostly requires that all the data be loaded into RAM, like Stata.
I don’t know much about Statistica (Windows-only and I was using it as a part of a data mining exercise).
Good luck on your program choice and your research!
Hello, Mr. Folta!
I found your interesting blog via your comment on The Popularity of Data Analysis Software (http://r4stats.com/articles/popularity). I was hoping that you would be so kind to give me advice in regard to statistical software selection (I am a Ph.D. candidate). Should you agree, please let me know if it’s OK to send you an e-mail with my question(s).
I’d be glad to answer any questions, which you could ask here. What field are you studying?