Excel SUMIF fixed range vs entire column

tl;dr: Which should I use and/or which is more efficient =SUMIF(A$5:A$5555,$D5,B$5:B$5555) or =SUMIF(A:A,$D5,B:B)?

I’m doing a SUMIF in Excel.

Col A has LOB starting in row 5 and goes through row 5555.
Col B has Face Amount similarly.
Row 4 is a header.
Cell B3 is =SUM(B5:B5555)…just as a checksum since that’s good information to have.
Then starting in cell D5 I sum up the face amount by lob, so D5 is “WL,” D6 is “UL,” etc. and E5 is =SUMIF(A$5:A$5555,$D5,B$5:B$5555)…that’s copied down in col E for each LOB that I have.

I’m looking to make the calculation efficiency of the spreadsheet as good as is reasonable…if efficiency improves by 1% but usability degrades by 50% that’s not a good tradeoff.

Should I use =SUMIF(A$5:A$5555,$D5,B$5:B$5555) or =SUMIF(A:A,$D5,B:B)?

The problem with =SUMIF(A$5:A$5555… is that you always need to make sure that your data range is appropriately accounted for in the SUMIF statement. I’m wondering if =SUMIF(A:A… is horribly worse in calculation efficiency (since it may look at every cell in the column) or if it’s actually better. I have a vague recollection of reading somewhere that Excel actually does better with =SUMIF(A:A… but maybe I’m misremembering.

I’m reading this but haven’t found anything truly helpful so far.

I’ve started to use Tables frequently when I need SUMIF’s for just that reason. Then the SUMIF uses the table column name and the formula is invariant to the length of the data, but also very efficient. Simplifies data columns that are formulas too by auto copying them down when the table length changes size.

+1 for data tables!

Thank you for that idea…do your tables get updated on a month-by-month basis? I’m just curious how one goes about making sure that the tables are complete without re-setting them up every month.

For this particular project, my data will always have fewer rows than the month before, but, in general, how would this work for data that may have more or less rows every month?

Optimal is to use an external data connection to query the data directly into the table. Then you can just right-click and “Refresh” to update the data. Length/formula columns should all auto-correct.

If it’s the typical “get a file from somewhere/someone” and copy/paste the data into your file, then Tables are easier when the data gets longer with each update. If the data is longer, you can simply paste value it in and the table will grow to the new length and formula columns should auto-complete to the end. When the data is shorter, I usually delete (not clear) all except the header and 1st data row (to preserve formula columns). Then paste-value my new data in. Since it’s now longer than the table, it’ll auto-grow and be ready.

I don’t actually know if it matters, but I generally keep all formula columns to the right of any data only ones; and when I paste-value, I only overwrite the data portion, not the headers.

I appreciate those ideas. Thanks. :toth:

Should I use =SUMIF(A$5:A$5555,$D5,B$5:B$5555) or =SUMIF(A:A,$D5,B:B)?

In my experience, the latter performs just fine. I’ve never clocked it though.

You seem to have a lot of these questions. Maybe you should start a long thread and record all your findings. I’m somewhat vaguely curious of what you discover. :slight_smile:

(I’m also curious about whatever monstrous model you’re looking at.)

If you are appending data, just make sure you insert new data into your old data.
If you are replacing data, insert new data into old data, then delete the old data.
(As long as the formats are equal. If not, well, that sucks for you.)

I generally paste new data a few rows below the old data. Then, along with a top empty row and a bottom empty row, insert data into the old data, then delete the old data, leaving only the new data.
The spaces aid in discerning the old data and the new data. Without them you won’t know which is which.

Also, SUMPRODUCT((data) * (conditions)), not SUMIF. SUMIF sucks, no matter what people say, because the format is confusing.
For the OP formula: SUMPRODUCT((A$5:A$5555) * ($D5=B$5:B$5555))
(Without spaces, as the GoA Board likes single stars as italics.)
Also note that SUMPRODUCT does not require the data first and conditions afterward.
It does, however, require those summed cells to be numbers.

A:A gets pretty slow.

I think I created a giant tool before and the calc speed using A:A vs defined range is significant.

Although, if I remember correctly, converting the data into a Table might be the best efficiency. Because you then just specify the table range name

Btw, “premature optimization” is considered a sin for programmers.

Basically it should be the last thing on your mind 99.9% of the time. Usually you should make an application, and then, if you need it to be faster, identify and eliminate bottlenecks.

The exception should only really occur if you are pretty sure you are doing something that you will later need to eliminate.

I think it was in the link that I posted above…SUMPRODUCT is old & slower than SUMIF/SUMIFS…if it wasn’t in the link above it was somewhere else on the internet that I read it.

SUMIF is kind of backwards. SUMIFS is more intuitive, imo, and I’ve used it in place of SUMIF even if I have only one condition.

Thank you for that info. That’s useful to know.

Yeah, I’m going to have to look into this possibility a bit more.

Indeed, predecessor set up this monstrosity. It works and correctly at that.
I’ve inherited it and want to make it work better.
I have little to no choice to be doing this in Excel.
But, good point, none the less.

But, also, and, on the other hand, if you know how to do something better, why not just do it the right way (or at least a better way) the first time?

Right, this is a “replacing data” sort of thing…what I’ve done in the past is put a border row at the bottom of the data and just insert/deleted rows above there as necessary. This works really well for whenever I do the monthly updates. It doesn’t work so well when the intern or boss-guy tries to do the updating because they don’t fully consider how the data fits into the process.

…which is why I was considering using the A:A approach since it wouldn’t require having to worry about where the soggy bottom of the data was.

Well, if you’re going to be working on it for a while, I recommend finding a way to find bottlenecks and clock changes. There’s often just a few formulas that are causing all your problems though I guess it depends on a lot of details.

literally with a stopwatch? …or are there different ways to do that? (serious questions)

Uh, I mean with VBA. You could have VBA calculate something n times and tell you the runtime.

Though really identifying bottlenecks for me is like:

Does this thing take 10 seconds to run?
If I delete half of it then it still takes 9.9 seconds to run?

Yes…yes, I do. I have found that the good people of GoActuary and its predecessor are generally very helpful, so I post a lot of them here.

I have a short attention span…and tend to be able to do just one thing at time. I’ll leave it to the archaeologists & paleontologists to sift through the rubble that is my threads and come up with a story of how one man defied expectations and was able to make a career out of such a limited skillset.

It’s not that impressive…just a lot of vlookup, sumif, sumifs, if then, etc…only in the 10s of thousands of rows…not too bulky yet.

You can use whole column references without issue. Sumifs and sumif internally keep track of the used rows.

Tables are better for clarity but appending data can be slow if the table has formulas and you are not careful.