Excel - HLOOKUP vs INDEX

1695814 · December 9, 2020, 3:10am

I have an inherited spreadsheet where the programmer used an HLOOKUP. The array of lookup_values (number of months) is in row 127. The resultant value that is being look’d up is always 256 rows below that.

Wouldn’t it be smarter to use the INDEX function?

Obviously, if some dodo head should ever insert/delete rows between the row of months & the row of factors then that hard-coded 256 is going to fubar the calculation.

Not as obvious, and what I’m really wondering about, would it help with the calculation speed? This spreadsheet is huge and painstakingly slow to calculate. I’m looking for things to make it more efficient.

Side question - What other things should I look for to make this thing more efficient?

BruteForce · December 9, 2020, 3:50am

One thing that’s helped me speed up big spreadsheets is to clear excess formatting. I used to just use a macro I found online to do this. Recently I was able to add the Inquire tab to my Excel ribbon. Basically the idea is when people format entire columns/rows, or have a lot of data and then delete it, it slows the spreadsheet down.

BruteForce · December 9, 2020, 3:52am

I don’t know. I’ve found macros online that supposedly test how long a cell (or range) takes to calculate, but I’ve never been able to get them to work. I’d be interested in seeing Excel formulas ranked by most efficient or something.

I do know that volatile functions (indirect, etc.) really slow down a spreadsheet.

BG5150 · December 9, 2020, 2:49pm

INDEX confuses me. I really should learn it.

Vorian_Atreides · December 9, 2020, 3:15pm

The best way I’ve found to think about INDEX is to think in terms of vectors and/or matrices.

Default vector is in a “vertical” form. So, =INDEX(vector, row) will return the value in the row-th position of vector.

=INDEX(matrix, row, col) will return the value in the (row, col) position of matrix.

MATCH is often used with INDEX in that MATCH returns the (first) position of a “lookup” value within a vector.

=INDEX(vector, MATCH(value, vector, 0)) ← returns a value if there is an exact match of value in vector

SredniVashtar · December 9, 2020, 3:21pm

Hlookup and index(match()) should have roughly identical calculation speeds. They do roughly the same thing.

You SHOULD replace the hlookups anyway, because looking up ‘256 rows’ is unreadable garbage and prone to breaking in the manner you suggest.

Calculation speed is mostly a matter of killing volatile functions (and vba user defined functions). Generally they cause the whole thing to recalculate…

That said, its possible that you actually need to recalculate your spread sheet over and over again (say because you’re repeatedly changing the essential inputs, or loading in new data.) If that’s the case, your bottlenecks are the slow operations that your computer is performing. The common ones are Sumproducts, Sumifs, etc. Followed by lookups with exact matches.

dr_t_non-fan · December 9, 2020, 3:27pm

I use a “0”, not “TRUE”. I think the use of TRUE returns a 1, which is the default, which is NOT the exact match.
The match type is optional, but the default is “exact or next smallest”, which tells me that the Excel creators assumed alphanumeric ordered data back in the 90’s, and cannot correct it.

SredniVashtar · December 9, 2020, 3:31pm

Once you use it a few times, you’ll find it actually makes a lot more sense than vlookup/hlookup.

Vorian_Atreides · December 9, 2020, 3:33pm

Thanks for that catch.

dr_t_non-fan · December 9, 2020, 3:41pm

And, there would have to be a ton of them. Like, over 1000 or so, sumproducting 10’s of 10000’s of rows of data.
Here’s an example off the top of my head:
You have 500000 rows of claims data that include, among other things, the ZIP Code. You want to sum the data by ZIP Code, but the list of ZIP codes differ every time you pull this data (say, monthly).
So, you set up the data in one tab, and a list of all ZIP codes, 00000 to 99999 in another tab down Column A, and the formula in Cell B1 is: SUMPRODUCT(([Claims column])*([ZIP code column] = A1)). And paste 100000 copies of these down Column B for each possible ZIP Code.
Better way, obviously, is to pull the data already summed by ZIP Code, which is why it is more efficient in the long run to learn to pull your own data.

dr_t_non-fan · December 9, 2020, 3:50pm

I’ve inherited spreadsheets like this. You can check each tab’s “Far Point” by pressing “end” then “home” (I’m always using transition keys, 'cause old). Suppose you can see that the spreadsheet work ends in H40, but the Far Point is Z4000, You can highlight unused columns I through, say, AA, and Alt-E-D-C to delete whole columns and highlight unused rows 41 through, say, 4001, and Alt-E-D-R to delete whole rows (not just delete the cell contents, which is simply “Delete”).

BruteForce · December 9, 2020, 4:04pm

When I do that and then hit CTRL+Home again, it still takes me out to the column/row I deleted. The only way I can seem to make it take me to where the spreadsheet work actually ends is by using a clear excess formatting macro/add-in.

dr_t_non-fan · December 9, 2020, 4:09pm

After finishing the deletes, you need to save the file. THEN, end-home should take you to H40 (in my example).
If it doesn’t, then there is something else screwy with that tab. An alternative, if it doesn’t mess up other parts, is to create a new tab, copy only the working cells from the old tab to the new tab, then delete the old tab.

BruteForce · December 9, 2020, 4:21pm

Huh, I’ve never tried saving after. I’ll give that a try, thanks!

1695814 · December 9, 2020, 4:54pm

It took me a while to get the hang of INDEX/MATCH, but it’s been worth it.
5 stars!
Would recommend!!!1one