I'd also like to be the first to offer a v11 specific project (but it's only slightly so
So - in the forums hash functions were mentioned (kind of in passing by Ben - and then by me) - so I thought I'd have a play!
The problem - storing 'tokens' (or a name) and a value is a very common one in computing - and speed of retrieval (or not if the token doesn't exist) is important. To do this - one technique is to combine the letters of a string into a hash value - this can then be used to divide the data into smaller sections to search.
Here I test 4 hash functions - a naïve add the characters of the string, HashDJB2, HashSDBM, and Hash FNV_1A - a quick google will reveal more if you are interested.
The value returned is used to split the data strings across 512 (as here) - buckets where each bucket holds a linked list of strings, values and pointers to the next string (or 0 at the end of the list) These are held in a large array of bytes (token_data) - I've used an esp32 - as it has plentiful RAM..
I generate as many random strings as will fit into the table - and then search for all of the strings
20x and 100000 random strings - some of which will exist and some not. Then report on the hash table statistics and time.
This gives :
Note - the number of tokens varies (random strings) - but as I don't alter the seed for random - is consistent over runs.Testing (fill table with random data)
Using Hash
Generating random symbols
Took 2634ms
Hash Table Statistics
Buckets in use = 509
Max bucket size = 38
Min bucket length = 0
Total tokens = 5877
Testing (fill table with random data)
Using HashDJB2
Generating random symbols
Took 2417ms
Hash Table Statistics
Buckets in use = 512
Max bucket size = 22
Min bucket length = 3
Total tokens = 5880
.....
GetBucket uses a switch - to select the hash function - and in the real world would just call 'HashFn'. I also haven't attempted to align data with word / long word boundaries - and just store a 16 bit value for each token - this would possibly be a pointer to a struct in a real application.
Strings are randomly generated - for example - if it has just one char, there are only 26 options ('A'..'Z') - this means there are duplicates - here they are ignored - though other strategies may replace the value etc
This - demonstrates how fast MCUs have become - and that unless large numbers of searches are required - even the naive hash gives pretty good results. Changing to optimise for speed - improves things too (to ~1600ms for 20 x search for all symbols and 10000 random strings)
As an exercise - setting the hash function to return a single value (SetHashFunction(4) )- gives a simple linked list search - and GenData isn't quick enough to avoid a WDT issue - storing a value requires searching to check if it exists - so if you want to test this, reduce the number of symbols generated (in GenData)
Martin
v10 - the v11 feature I use is the 'new' .x += 1 syntax (in this case increments .x) - other variants such as *= /= &= |= etc - are very intuitive to use (for C/C++/Python/Java etc programmers)
Replace these with the old .x = .x + 1 syntax to run in v10