more-arw search

Q&A Forum

To our Excel Gurus...

Someone I'm connected to on LinkedIn is sourcing a job. One of the myriad of requirements is that the person had experience with Excel sheets containing 100K+ rows. Why would you trust your business to a 100K row spreadsheet? Wouldn't recalc take a very long time? The chance for file corruption increase? The ability to jeopardize data integrity increase? The chance for a cell formula to be wrong and repeated [endlessly]? Are there not better products in which to maintain and analyze the data? I'm having difficulty understanding a) obviously why you would use Excel and b) what special skill sets one has to handle this mega spreadsheet (over "normal" sized sheets)?

Answers

Topic Expert
Charley Kyd
Title: Founder
Company: ExcelUser
(Founder, ExcelUser) |

Wayne,

Broadly speaking, a worksheet with more than 100K rows could be of two types. One type could be an analysis with 100K+ rows. If that's what the employer has in mind, I definitely agree that Excel reports and analyses NEVER should be that large.

The other type of workbook could be a data workbook, which could contain Excel Tables much larger than 100K rows. (Tables with *no* calculations can approach a million rows.) Done right, this type of design would allow any number of report workbooks to reference several such data workbooks.

There are many advantages to this type of design for Excel reports. Some include...

...The data typically comes from CSV or export-to-Excel files, so there's no manual data entry, just copy and paste to a data workbook each period.

...By updating a data workbook once, you automatically update any number of reports or analyses that link to that workbook.

...Data workbooks can include sources of data not readily available to "better" analytical products, this could include data from silo systems, economic data, weather data, benchmarks, competitive data, population data, interest and exchange rates, and so on.

...Data integrity need not be an issue with this design, for several reasons: (1) Users easily can create data-scrubbing Excel reports. (2) The Excel data has an audit trail from a report, to the data workbook, to the original source of the data. (3) When many Excel users analyze the same data set, they're bound to uncover any errors that *do* exist, and those errors typically surfaced in the source data, not Excel. (4) With large data sets available, there's typically enough data for the Excel reports and analyses to be reconciled, a process that can reveal errors in both the reports *and* the source data.

...And finally, the knowledge workers who would tend to create the reports and analyses live and breathe Excel. So they have the benefits of using a product they know well to report and analyze consistent data from a reliable source.

Topic Expert
Wayne Spivak
Title: President & CFO
Company: SBAConsulting.com
LinkedIn Profile
(President & CFO, SBAConsulting.com) |

Thanks Charley

Of course, the source request didn't specify one way or the other as to the type of data being kept.

One question. Putting ad-hoc queries aside (which is a big aside for many reasons, such as defining what ad-hoc means in the first place) isn't pre-programming a report from a database optimized to do retrieve and compute a more effective method of retrieving results?

Topic Expert
Charley Kyd
Title: Founder
Company: ExcelUser
(Founder, ExcelUser) |

Let me answer your question this way...

My first job after my MBA was as a cost accounting manager at HP. Because I had programmed for years, I quickly started to generate my own reports, using the same resources that IT used. Those reports gave me facts that I needed to find errors in the data, make journal entries, understand variances, and so on.

One day, the division's Marketing Manager showed up at my desk and asked, "Are you the guy who's doing so much programming?" When I admitted that I was, he said that he'd just seen a product that was for bean counters, and asked if I would spend some time with it and tell him what I thought.

The program was VisiCalc, and I fell in love with it. Finally, I had found a program that allowed me to combine data and my subject-matter expertise. I was no longer restricted to just dumping out historical data for my own use. Instead, spreadsheets allowed me to combine historical data *from any available source* with sophisticated calculations and (at times) assumptions to generate presentations that gave *meaning* to data.

Spreadsheets gave me the power to give senior managers the information they needed, but didn't know they could get.

Canned reports *document* past performance; good spreadsheet reports *explain* past performance and help to *predict* future performance. We call all of them "reports", but there are significant differences among them.

Topic Expert
Wayne Spivak
Title: President & CFO
Company: SBAConsulting.com
LinkedIn Profile
(President & CFO, SBAConsulting.com) |

I remember VisiCalc. BTW, as a sidebar, the creation and life of VisiCalc is fascinating reading. Maybe one of the greatest tales of the perils of sitting on your laurels.

Topic Expert
Wayne Spivak
Title: President & CFO
Company: SBAConsulting.com
LinkedIn Profile
(President & CFO, SBAConsulting.com) |

I guess my biggest issue with spreadsheets is when they are used as source of data, not a reporting engine.

Gregg Kimmer
Title: Director of Finance
Company: Humana
(Director of Finance, Humana) |

I suspect the data source (i.e., the thing that will turn into the final analysis) is 100k+ rows. But, the end result used for analysis is probably a pivot table or other summarized report.

By the way, 100k+ rows isn't really overwhelming if a person needs to be an Excel guru. The newer version of excel allows for 1m+ rows.

Bob Low
Title: Principal
Company: Perron & Low
(Principal, Perron & Low) |

This made me check the number of rows in a spreadsheet I have with sales detail from last year for a client -- it comes in at 70,000. It's basically a table with data exported from several QuickBooks files. Because QB doesn't do consolidation, we can't get a canned report to do the same thing. However, working with this many rows is no more complicated than if it were 1000 rows -- just things like pivot table and filtering. No issues with recalc time. So, 100,000 in and of itself is not a problem and probably not uncommon if it's a data export. But Excel skills will still be needed to check data integrity and you do have to question why the accounting system can't do the analysis. In our case, we are in the process of replacing QB with a system that can handle multiple entities so we don't need to export to Excel.

Deborah Schuler
Title: retired
Company: na
(retired, na) |

I would not want to work with an 100k Excel spreadsheet that contained formulas. That is a recipe for disaster if a formula goes wrong or is accidentally overwritten. It's ok if we are only talking the data portion which rolls up to another sheet with the formulas and even then you have to be very careful that those formulas are pulling the data properly. For very large data sets, I prefer to do the preliminary work in an Access database and then pull the results into Excel to do any further processing if necessary. Access is much easier to use with large amounts of data although you still have to be careful that it is consolidating the data properly.

Topic Expert
Charley Kyd
Title: Founder
Company: ExcelUser
(Founder, ExcelUser) |

Wayne,

>I guess my biggest issue with spreadsheets is when they are used as source of data, not a reporting engine.

In no particular order, here are the general sources of data I know of that we can use for creating spreadsheet reports:

1. Manual data entry in random cells needed for a report, or "hand-carved" reports that combine imported data with a lot of manual manipulation in the spreadsheet. These are horrible methods, because they're slow and error-prone. As a general rule, if your worksheet report contains numbers in cells, you're making a big mistake.

2. PivotTables. The good thing about pivots is that they're live-linked to external sources of data. The bad things are: You're typically limited to relational sources. You're limited to documenting historical data, not explaining it. You're limited to one view of data for each report. Few users know how to create pivots. And fewer still know how to set up pivots as an Excel database.

3. Excel-friendly OLAP. These are programs like TM1 and PowerOLAP, which provide worksheet functions that (1) return data from cubes on the server, and, (2) allow users to write data for budgets and forecasts directly from spreadsheets back to the server. These are very powerful solutions.

4. ERP-linked software. Products from F9 Software, Global Software, BizNet Software, and the like, offer worksheet functions that are live-linked to ERP data. They offer a lot of power for reporting against SAP, Sage, Microsoft Dynamics, and other ERPs.

5. Data worksheets. Using this method offers the most-flexible and lowest-cost way to live-link spreadsheet reports to reliable sources of data.

Sources 3, 4, and 5 all can be used on one spreadsheet to create one report that's live-linked to the three sources of data. Even one cell in the report could be linked to all three sources.

So, Wayne, other that source #1, which of these do you have a problem with? Or do you have additional problematic methods in mind?

3491 views
Products and Companies

Get Free Membership

By signing up, you will receive emails from Proformative regarding Proformative programs, events, community news and activity. You can withdraw your consent at any time. Contact Us.

Business Exchange

Browse the Business Exchange to find information, resources and peer reviews to help you select the right solution for your business.

Learn more

Contribute to Community

If you’re interested in learning more about contributing to your Proformative community, we have many ways for you to get involved. Please email content@proformative.com to learn more about becoming a speaker or contributing to the blogs/Q&A Forum.