OT: Spreadsheet software for very large files
Started by dan7000
on 10/1/2014
dan7000
10/1/2014 11:08 pm
Have been looking for this for a few days and just realized maybe the CRIMP community would have some ideas.
I've been working with some large spreadsheets lately (100,000 rows, 50 or so columns). The main thing I need to do with them is just sort and filter on different columns. But in Excel, every time I sort or filter on a new column it takes 5 minutes and up. I've tried Excel 2010 and 2013, same thing. This seems like something that modern software should do better, but I can'f seem to find any alternatives. I tried uploading to Google Sheets but despite the claims on the Sheets website (claiming to have no size limit), it fails with an error that the file is too large. I tried uploading in sections but you cannot copy and paste the sections together because it hangs attempting to copy/paste large numbers of rows. I guess I could type it all into a new Google Sheet one row at a time if I wanted to kill a week or so.
Any suggestions for online or windows-based software that can work with this size of spreadsheet without hanging?
I've been working with some large spreadsheets lately (100,000 rows, 50 or so columns). The main thing I need to do with them is just sort and filter on different columns. But in Excel, every time I sort or filter on a new column it takes 5 minutes and up. I've tried Excel 2010 and 2013, same thing. This seems like something that modern software should do better, but I can'f seem to find any alternatives. I tried uploading to Google Sheets but despite the claims on the Sheets website (claiming to have no size limit), it fails with an error that the file is too large. I tried uploading in sections but you cannot copy and paste the sections together because it hangs attempting to copy/paste large numbers of rows. I guess I could type it all into a new Google Sheet one row at a time if I wanted to kill a week or so.
Any suggestions for online or windows-based software that can work with this size of spreadsheet without hanging?
moritz
10/2/2014 12:58 am
You should check out Excel 2013 Power Pivot (... typically in combination with Power Query and Power View).
We have done some work for clients to re-platform massive spreadsheets (e.g. 10m cells, 4m formulas) into the new "in memory BI" paradigm.
Results are very impressive: Spreadsheet size came down from 300 MB (same example) to 20MB (!!), number of formulas reduced from 4m down to 40 (by separating out the data transformation part via the tabular model and Power Query transformation rules).
Official limit on number of rows is 2 billion, the engine is quite fast as well (almost everything runs real time).
Powerpivotpro.com is a good resource to get you started.
One downside (could be a deal breaker for you): The PowerPivot capabilities require Excel 2013 "Professional" (or Office 2013 Professional or equivalent Office 365 subscription). Not included with "Home" versions of the software.
We have done some work for clients to re-platform massive spreadsheets (e.g. 10m cells, 4m formulas) into the new "in memory BI" paradigm.
Results are very impressive: Spreadsheet size came down from 300 MB (same example) to 20MB (!!), number of formulas reduced from 4m down to 40 (by separating out the data transformation part via the tabular model and Power Query transformation rules).
Official limit on number of rows is 2 billion, the engine is quite fast as well (almost everything runs real time).
Powerpivotpro.com is a good resource to get you started.
One downside (could be a deal breaker for you): The PowerPivot capabilities require Excel 2013 "Professional" (or Office 2013 Professional or equivalent Office 365 subscription). Not included with "Home" versions of the software.
jimspoon
10/2/2014 4:02 am
What's your hardware, how much RAM? I have used Excel 2007 to filter a spreadsheet with well over 100,000 rows, and it was very fast. Sorting went quickly too. Are the filters and sorts very complex?
I had much better luck with Excel 2007 than OpenOffice Calc.
I had much better luck with Excel 2007 than OpenOffice Calc.
Stephen Zeoli
10/2/2014 11:15 am
I suggest giving Panorama Sheets a try:
http://www.provue.com/panoramasheets/
It's really a database that behaves like a spreadsheet -- supposedly. I don't use the application myself, so I'm not recommending it. I'm just saying it might be an alternative that would work and they have a free trial, so you can give it a go before you buy.
Steve Z.
http://www.provue.com/panoramasheets/
It's really a database that behaves like a spreadsheet -- supposedly. I don't use the application myself, so I'm not recommending it. I'm just saying it might be an alternative that would work and they have a free trial, so you can give it a go before you buy.
Steve Z.
dan7000
10/3/2014 1:20 am
Thanks all!! I'll definitely try powerpivot and panorama sheets and let you all know the outcome. I have a relatively new lenovo laptop with 8GB RAM and Windows 7 So I'm not sure what makes it so slow.
dan7000
10/3/2014 1:25 am
Oops looks like Panorama is Mac-only :(
Dominik Holenstein
10/3/2014 11:59 am
Panorama Sheets and Panorama are available for Windows as well:
http://www.provue.com/panorama/riskfreetrial/index.html
Best,
Dominik
http://www.provue.com/panorama/riskfreetrial/index.html
Best,
Dominik
Alexander Deliyannis
10/3/2014 5:58 pm
I find the product very interesting, but it appears to have not been upgraded since 2010:
http://www.provue.com/Downloads/history/index.html
http://www.provue.com/Downloads/history/index.html
dan7000
10/3/2014 7:32 pm
Panorama crashed with the first spreadsheet I tried to import.
Sorry for veering even more off-topic but for those who are interested, a colleague discovered that the spreadsheets work much faster in Excel when hyperlinks are removed - a couple of the columns had hyperlinks. Perhaps excel verifies the links when opening or filtering the database or something. That would certainly slow things down.
Sorry for veering even more off-topic but for those who are interested, a colleague discovered that the spreadsheets work much faster in Excel when hyperlinks are removed - a couple of the columns had hyperlinks. Perhaps excel verifies the links when opening or filtering the database or something. That would certainly slow things down.
22111
10/4/2014 3:14 pm
Just for the record: Zeoli mentioned Panorama Sheets (and did this in spite of dan7000's specifications), and now dan7000 speaks of Panorama. It's unfortunate dan7000 seems to be inclined to give up sometimes rather soon ("Mac only"), since I doubted from start on that his amount of data would go into Sheets (see his first post), which is incredibly crippled vàv Panorama, both in functionality AND in amount of data it can process (but also in price, to be fair) - of course, any such choking should not result in a crash but in an adequate error dialog; I myself had considered Sheets briefly once, and had immediately mused about the availability of and reason in a possibility to cut down existing data, far exceeding the 50 mb allowed by Sheets, into some "relevant" part(s) Sheets then could process; I discarded this idea in favor of some further scripting, instead of multiplying tools, and multiplying transitions forth and back.
This being said, it would have been interested to know some details here - OT or not, it's IM, and that should always be welcomed here if I dare give my opinion?
As it is, it's to be feared though that this thread will die soon as did so many threads before: without giving any new info whatsoever, onT or OT... (Well, the Excel slow down by links is a known fact, thus cannot count as new info: http://www.excelforum.com/excel-general/938169-excel-slow-open-save-i-think-because-of-hyperlinks.html , http://blogs.technet.com/b/the_microsoft_excel_support_team_blog/archive/2012/12/18/top-10-list-of-performance-issues-in-excel-workbooks.aspx , and so on.)
And I had been really eager to get some new knowledge here, about Sheets and/or Panorama... ;-(
This being said, it would have been interested to know some details here - OT or not, it's IM, and that should always be welcomed here if I dare give my opinion?
As it is, it's to be feared though that this thread will die soon as did so many threads before: without giving any new info whatsoever, onT or OT... (Well, the Excel slow down by links is a known fact, thus cannot count as new info: http://www.excelforum.com/excel-general/938169-excel-slow-open-save-i-think-because-of-hyperlinks.html , http://blogs.technet.com/b/the_microsoft_excel_support_team_blog/archive/2012/12/18/top-10-list-of-performance-issues-in-excel-workbooks.aspx , and so on.)
And I had been really eager to get some new knowledge here, about Sheets and/or Panorama... ;-(
Franz Grieser
10/4/2014 8:28 pm
22111
And what exactly was the new info you provided in your post?
Franz
(yes, I know MY post does not contribute anything productive to the original question, either)
22111 wrote:
And what exactly was the new info you provided in your post?
Franz
(yes, I know MY post does not contribute anything productive to the original question, either)
22111 wrote:
Just for the record: Zeoli mentioned Panorama Sheets (and did this in
spite of dan7000's specifications), and now dan7000 speaks of Panorama.
It's unfortunate dan7000 seems to be inclined to give up sometimes
rather soon ("Mac only"), since I doubted from start on that his amount
of data would go into Sheets (see his first post), which is incredibly
crippled vàv Panorama, both in functionality AND in amount of data
it can process (but also in price, to be fair) - of course, any such
choking should not result in a crash but in an adequate error dialog; I
myself had considered Sheets briefly once, and had immediately mused
about the availability of and reason in a possibility to cut down
existing data, far exceeding the 50 mb allowed by Sheets, into some
"relevant" part(s) Sheets then could process; I discarded this idea in
favor of some further scripting, instead of multiplying tools, and
multiplying transitions forth and back.
This being said, it would have been interested to know some details here
- OT or not, it's IM, and that should always be welcomed here if I dare
give my opinion?
As it is, it's to be feared though that this thread will die soon as did
so many threads before: without giving any new info whatsoever, onT or
OT... (Well, the Excel slow down by links is a known fact, thus cannot
count as new info:
http://www.excelforum.com/excel-general/938169-excel-slow-open-save-i-think-because-of-hyperlinks.html
,
http://blogs.technet.com/b/the_microsoft_excel_support_team_blog/archive/2012/12/18/top-10-list-of-performance-issues-in-excel-workbooks.aspx
, and so on.)
And I had been really eager to get some new knowledge here, about Sheets
and/or Panorama... ;-(
22111
10/7/2014 4:55 pm
If you can't see it, don't look further, since then it's not important for you.
And after this short interlude, we're where we've been before, more or less.
We've got some clarification, but no specifics about possible real use of either program.
Thus, we've got another name dropping thread. You know, in theory, I very much like to just read about outlining and such, but then, I really love to get some real information, albeit not writing here for myself anymore.
Wasn't this originally for sharing experiences with software? Is "I just touched it (which version?), but never did anything do with it" enough now for an experience worth to be shared?
More specifically: It would have been worthwile to read if the full version choked with the data, and then, by which procedure(s), or if the experience was meaningless because the lite version choked, with data it was never meant to work with.
These are questions I just ask as an avid reader. Do you think I should not be entitled to ask them?
And after this short interlude, we're where we've been before, more or less.
We've got some clarification, but no specifics about possible real use of either program.
Thus, we've got another name dropping thread. You know, in theory, I very much like to just read about outlining and such, but then, I really love to get some real information, albeit not writing here for myself anymore.
Wasn't this originally for sharing experiences with software? Is "I just touched it (which version?), but never did anything do with it" enough now for an experience worth to be shared?
More specifically: It would have been worthwile to read if the full version choked with the data, and then, by which procedure(s), or if the experience was meaningless because the lite version choked, with data it was never meant to work with.
These are questions I just ask as an avid reader. Do you think I should not be entitled to ask them?
Franz Grieser
10/7/2014 9:48 pm
22111 wrote:
You have every right to ask these questions. In my world.
I just took offence at your asking "where's the meat" (that's not meant literally) after writing sentence after sentence without providing meat.
Sorry, I had just lost my patience.
Franz
These are questions I just ask as an avid reader. Do you think I should
not be entitled to ask them?
You have every right to ask these questions. In my world.
I just took offence at your asking "where's the meat" (that's not meant literally) after writing sentence after sentence without providing meat.
Sorry, I had just lost my patience.
Franz
dan7000
10/7/2014 11:39 pm
I'm not really sure what is being asked but I'll try to give more detail about my experience if it's helpful, although I cannot promise that it will not be old news, like the hyperlink issue in Excel, which I did not know about previously.
Panorama imports CSV. I did not see any mention of a size limit. When you import the CSV it will not take the first row as field names, but forces you to type in field names, types, and defaults for every column. The CSV I tried to import initially had about 60 columns. After seeing that I had to type in all these names I went back and created a version with only about 25 columns. As I was typing in those names, the dialog box for field name entry gradually began having display issues. While initially you would see a list of 8 or so fields with a scroll bar at the right, as I kept typing them in the list grew shorter and took up only the top part of the dialog box and part of the scroll bar was missing. Eventually the box stopped responding. The whole experience was janky enough to convince me not to try the software again.
22111 also mentions "Sheets" - I assume this might be referring to Panorama sheets, but it also might be referring to Google Sheets, which I mentioned in my first post. Google Sheets does not have a MB limit. The old version did, but "All spreadsheet limits mentioned above have been removed in the new version of Google Sheets." https://support.google.com/drive/answer/37603?hl=en
But when I recently went back to that page to confirm, I realize it also says "The new version of Google Sheets should support 2 million cells of data, though please note that extremely large spreadsheets may have slower performance." While 2 million sounded like a lot, I realize that my initial spreadsheet clearly has more than that (100k rows X 60 columns). So I was exceeding the cell number limit.
I have Google sheets set to convert excel to sheets format on upload. The error I get is unhelpful, something to the effect of "The upload has failed. Either input or output size is too large or the upload required too much system resources." [Yes, it has the bad grammar, making me think not too many people see this error.]
I also considered Office 365 Excel Online which has a ridiculous 5 MB limit.
I have not tried powerpivot, since after deleting hyperlinks Excel on Windows is working great.
Panorama imports CSV. I did not see any mention of a size limit. When you import the CSV it will not take the first row as field names, but forces you to type in field names, types, and defaults for every column. The CSV I tried to import initially had about 60 columns. After seeing that I had to type in all these names I went back and created a version with only about 25 columns. As I was typing in those names, the dialog box for field name entry gradually began having display issues. While initially you would see a list of 8 or so fields with a scroll bar at the right, as I kept typing them in the list grew shorter and took up only the top part of the dialog box and part of the scroll bar was missing. Eventually the box stopped responding. The whole experience was janky enough to convince me not to try the software again.
22111 also mentions "Sheets" - I assume this might be referring to Panorama sheets, but it also might be referring to Google Sheets, which I mentioned in my first post. Google Sheets does not have a MB limit. The old version did, but "All spreadsheet limits mentioned above have been removed in the new version of Google Sheets." https://support.google.com/drive/answer/37603?hl=en
But when I recently went back to that page to confirm, I realize it also says "The new version of Google Sheets should support 2 million cells of data, though please note that extremely large spreadsheets may have slower performance." While 2 million sounded like a lot, I realize that my initial spreadsheet clearly has more than that (100k rows X 60 columns). So I was exceeding the cell number limit.
I have Google sheets set to convert excel to sheets format on upload. The error I get is unhelpful, something to the effect of "The upload has failed. Either input or output size is too large or the upload required too much system resources." [Yes, it has the bad grammar, making me think not too many people see this error.]
I also considered Office 365 Excel Online which has a ridiculous 5 MB limit.
I have not tried powerpivot, since after deleting hyperlinks Excel on Windows is working great.
Stephen Zeoli
10/8/2014 11:04 am
I expected Sheets would do a better job than that, though I'm not sure why. I guess I just assume a database is usually more powerful than a spreadsheet, though the distinction is getting pretty blurry these days. Probably why Sheets has languished. I am sorry I brought it up in the first place.
Steve Z.
Steve Z.
22111
10/8/2014 12:31 pm
Franz, I see now and understand; indeed, my term of "meat" had been to strong I'm aware now. No offence whatsoever.
Steven, that would be a very non-constructive reaction, I'm thankful on the contrary you brought it up, see below.
dan7000, first I'm sorry for the muddle, my "Sheets" was distinction only between the full, and the lite version; I wasn't aware it could be misunderstood with regards to a Google applic. Second, thank you very much for this clarification that you tried with the full version indeed, and not with Steven's proposal of a lite version where it's evident that your data was too heavy for it to process (they prefer selling their full version, and who could blame them, it's just the denomination of the lite version which is a little bit misleading, but understable from their point of view, by having not only crippled it on the field of amount of data the lite version can process (which is quite unfortunate), but also by taking away its db qualities (which would make sense indeed, as the difference between lite (spreadheet-only) and the full (full db functionality) version (and thus their naming of the lite version, but which does not take into account the very important amount-of-data crippling).
This being said, the "meat" is there, now, we now know that even the full version chokes with a reasonable amount of data that it should have been able to process without difficulty, according to what they pretend for this (not cheap) full version. This is a big disappointment, but it's been very good to know, all the more so since most of us, most of the time, do trialling with just light amounts of data (which also makes the big interest of Paul Miller's blog, btw, for his outliner testing).
I'm sorry for not having worded my question as concise as I could have done it, and I'm very thankful for the clarifications that have come in-between; in order to add just some tiny "meat" to this discussion, allow me to say that I, having promoted the idea of ready-made, individual links anyway, instead of triggering the specific links in any given of multiple used applications, would suggest to add some code character before any link in Excel, in order for it to become unrecognizable as link for that application, and then intercept that special character when processing the links.
This way, Excel should NOT be slowed down by them anymore, and you would be able to trigger these entries as links notwithstanding; a simple leading comma should be fine (did not test this though), and if really needed, some more special special character would do the trick, e.g. the ansi character for "deceased" or some similar; as said before, the similar trick in outliners would succeed in facilitating any possible switch from one given outliner to another one.
(Of course, I'm asking myself how the presence of these (recognizable-as-such) links in the data to be imported into Panorama might have been the culprit in Panorama's choking over import, but anyway, that import should have been flawless. And even further down the line, this would not be the very first application that possibly works flawlessly within its original Apple environment, while presenting problems in its Windows variety after code transposition, be it for memory processing or for other reasons - this is just guessing on my part of course.)
Anyway, thank you very much, dan7000, for your kind clarification.
Steven, that would be a very non-constructive reaction, I'm thankful on the contrary you brought it up, see below.
dan7000, first I'm sorry for the muddle, my "Sheets" was distinction only between the full, and the lite version; I wasn't aware it could be misunderstood with regards to a Google applic. Second, thank you very much for this clarification that you tried with the full version indeed, and not with Steven's proposal of a lite version where it's evident that your data was too heavy for it to process (they prefer selling their full version, and who could blame them, it's just the denomination of the lite version which is a little bit misleading, but understable from their point of view, by having not only crippled it on the field of amount of data the lite version can process (which is quite unfortunate), but also by taking away its db qualities (which would make sense indeed, as the difference between lite (spreadheet-only) and the full (full db functionality) version (and thus their naming of the lite version, but which does not take into account the very important amount-of-data crippling).
This being said, the "meat" is there, now, we now know that even the full version chokes with a reasonable amount of data that it should have been able to process without difficulty, according to what they pretend for this (not cheap) full version. This is a big disappointment, but it's been very good to know, all the more so since most of us, most of the time, do trialling with just light amounts of data (which also makes the big interest of Paul Miller's blog, btw, for his outliner testing).
I'm sorry for not having worded my question as concise as I could have done it, and I'm very thankful for the clarifications that have come in-between; in order to add just some tiny "meat" to this discussion, allow me to say that I, having promoted the idea of ready-made, individual links anyway, instead of triggering the specific links in any given of multiple used applications, would suggest to add some code character before any link in Excel, in order for it to become unrecognizable as link for that application, and then intercept that special character when processing the links.
This way, Excel should NOT be slowed down by them anymore, and you would be able to trigger these entries as links notwithstanding; a simple leading comma should be fine (did not test this though), and if really needed, some more special special character would do the trick, e.g. the ansi character for "deceased" or some similar; as said before, the similar trick in outliners would succeed in facilitating any possible switch from one given outliner to another one.
(Of course, I'm asking myself how the presence of these (recognizable-as-such) links in the data to be imported into Panorama might have been the culprit in Panorama's choking over import, but anyway, that import should have been flawless. And even further down the line, this would not be the very first application that possibly works flawlessly within its original Apple environment, while presenting problems in its Windows variety after code transposition, be it for memory processing or for other reasons - this is just guessing on my part of course.)
Anyway, thank you very much, dan7000, for your kind clarification.
xtabber
10/9/2014 2:57 am
GS-Calc from Citadel5 can supposedly handle up to 12 million rows and use up to 16 processor cores during calculations. And to get the thread back on topic, it has a hierarchical view of sheets that looks like it might provide some outlining capabilities within a workbook.
It is also ridiculously cheap, at $20.
http://www.citadel5.com/
After reading some glowing reviews earlier this year, I purchased a license, as I sometimes work with files that have many thousands of rows and have also found Excel to be slow for that kind of thing. But I haven't had the need for it lately and haven't had the time to try it, so I can't tell you if it will work for the stated purpose, or at all, for that matter.
It is also ridiculously cheap, at $20.
http://www.citadel5.com/
After reading some glowing reviews earlier this year, I purchased a license, as I sometimes work with files that have many thousands of rows and have also found Excel to be slow for that kind of thing. But I haven't had the need for it lately and haven't had the time to try it, so I can't tell you if it will work for the stated purpose, or at all, for that matter.
