Fun with Flowcode - v11 - Hash Functions

Use this section to discuss your embedded Flowcode projects.
Post Reply
mnfisher
Valued Contributor
Posts: 1690
http://meble-kuchenne.info.pl
Joined: Wed Dec 09, 2020 9:37 pm
Has thanked: 146 times
Been thanked: 789 times

Flowcode v11 Fun with Flowcode - v11 - Hash Functions

Post by mnfisher »

First up - I'd like to congratulate the Matrix team on the release of v11. There are lots of great new features to try - they have worked very hard - so I hope they are all off to the pub tonight for a well earned pint!

I'd also like to be the first to offer a v11 specific project (but it's only slightly so :-) )

So - in the forums hash functions were mentioned (kind of in passing by Ben - and then by me) - so I thought I'd have a play!

The problem - storing 'tokens' (or a name) and a value is a very common one in computing - and speed of retrieval (or not if the token doesn't exist) is important. To do this - one technique is to combine the letters of a string into a hash value - this can then be used to divide the data into smaller sections to search.
Here I test 4 hash functions - a naïve add the characters of the string, HashDJB2, HashSDBM, and Hash FNV_1A - a quick google will reveal more if you are interested.
The value returned is used to split the data strings across 512 (as here) - buckets where each bucket holds a linked list of strings, values and pointers to the next string (or 0 at the end of the list) These are held in a large array of bytes (token_data) - I've used an esp32 - as it has plentiful RAM..

I generate as many random strings as will fit into the table - and then search for all of the strings
20x and 100000 random strings - some of which will exist and some not. Then report on the hash table statistics and time.

This gives :
Testing (fill table with random data)
Using Hash
Generating random symbols

Took 2634ms

Hash Table Statistics

Buckets in use = 509
Max bucket size = 38
Min bucket length = 0
Total tokens = 5877

Testing (fill table with random data)
Using HashDJB2
Generating random symbols

Took 2417ms

Hash Table Statistics

Buckets in use = 512
Max bucket size = 22
Min bucket length = 3
Total tokens = 5880
.....
Note - the number of tokens varies (random strings) - but as I don't alter the seed for random - is consistent over runs.
GetBucket uses a switch - to select the hash function - and in the real world would just call 'HashFn'. I also haven't attempted to align data with word / long word boundaries - and just store a 16 bit value for each token - this would possibly be a pointer to a struct in a real application.

Strings are randomly generated - for example - if it has just one char, there are only 26 options ('A'..'Z') - this means there are duplicates - here they are ignored - though other strategies may replace the value etc

This - demonstrates how fast MCUs have become - and that unless large numbers of searches are required - even the naive hash gives pretty good results. Changing to optimise for speed - improves things too (to ~1600ms for 20 x search for all symbols and 10000 random strings)

As an exercise - setting the hash function to return a single value (SetHashFunction(4) )- gives a simple linked list search - and GenData isn't quick enough to avoid a WDT issue - storing a value requires searching to check if it exists - so if you want to test this, reduce the number of symbols generated (in GenData)

Martin

v10 - the v11 feature I use is the 'new' .x += 1 syntax (in this case increments .x) - other variants such as *= /= &= |= etc - are very intuitive to use (for C/C++/Python/Java etc programmers)
Replace these with the old .x = .x + 1 syntax to run in v10
Attachments
Hash.fcfx
(47.75 KiB) Downloaded 11 times

Post Reply