Subsampling with Excel (and a little help from Python)


I have always been a fan of subsampling. It is no secret that I find mod() a little bit limiting.

Suppose for whatever reason I wanted to run some regressions in Excel with subsampled data. That turns out to be easy.

I have 3 csv files now from DataMiner downloads. I can concatenate those 3 files and then subsample those and write a file to an Excel file on my desktop. Run the regression within that file. Model average and keep track with another Excel spreadsheet if that is my purpose.

Honesty thought to be useful for some members including those using spreadsheets a lot. I will be using it and wanted to share with those who are interested:


Uh okay that was so easy I wondered if I might, at times, want to just bootstrap a smaller file keeping in mind that Excel has a row limit of 1,048,576 rows.

Notice the use of “random_state” here. I can probably stop requesting that as a feature at P123 now.