Shared Dict API:

Module shared_atomic.shared_dict

The algorithm of shared_dict to achieve concurrent access, is borrowed but not exact the same as ConcurrentHashMap in JDK1.8. We also change the language to implement this to C++ to integrate to Python.

The parallel access and parallel expansion between hashbuckets is achieved by seperate locks. The automatic expansion of the shared_dict would start when the number of elements in the shared_dict(len()) gets larger than the twice of the number of hashbuckets. After the expansion either automatic or manual through the shared_dict.expansion() method, the number of hashbuckets would be the twice of the original. So the time complexity to access the elements would be maintained around O(2) ideally if not affected by manual expansion.

The input_key and the input_value should be picklable and the input_key should be hashable python objects. If the shared_dict is shared across multiple processes, the hash algorithms should produce the same result for the same input_key. In CPython, the built-in types except integer and float will produce different hash results across multiple processes, you should use subclasses or self-designed classes to customise the __hash__() method.

shared_atomic.shared_dict.dict_get(target: shared_dict, input_key) list

get the target object from shared_dict with input_key

param target:

the target shared_dict

param input_key:

the target key value

return:

list in form of [Boolean, object]. The first boolean value indicates whether the input_key is in the dict. The second python object is the target object if the first boolean value is True.

shared_atomic.shared_dict.dict_insert(target: shared_dict, input_key, input_value) int

insert the target shared_dict with input_key and input_value

param target:

the target shared_dict

param input_key:

the target key value

param input_value:

python object to insert

return:

1 if successful

shared_atomic.shared_dict.dict_remove(target: shared_dict, input_key) int

remove the target object from shared_dict with input_key

param target:

the target shared_dict

param input_key:

the target key value

return:

1 if successfully removed, 0 if the input_key not in the target shared_dict

class shared_atomic.shared_dict.shared_dict

class provide shared dict

__init__(self, name: bytes = None, size: int = 650000000, bucket_chunk_size_exponent: int = 20, bucket_chunk_number: int = 100, chunk_size_exponent: int = 5)

Constructor of shared_dict. The max number of the hashbuckets are defined by (2**bucket_chunk_size_exponent) * bucket_chunk_number. The initial number of hashbuckets is defined by (2**bucket_chunk_size_exponent), which means only one bucket chunk is initialized.

param name:

file name of the shared dict

param size:

file size of the shared dict

param bucket_chunk_size_exponent:

the exponent of the hash bucket chunk size in the base of 2, for example, 20 means the the hash bucket chunk size is 2**20=1048576.

param bucket_chunk_number:

the number of the hash bucket chunks, for example, 100 means there are 100 hash bucket chunks; the size of each of them is determined by the previous parameter bucket_chunk_size_exponent.

param chunk_size_exponent:

the exponent of the data chunk size when allocating from shared memory in the base of 2, for example, 5 means the the data chunk size is 2**5=32 bytes when allocating from shared memory.

expansion(self, parallelism: int = 0) int

expand the shared_dict in parallel

param parallelism:

the degree of parallelism, if 0, it is the number of logical CPUs the calling thread is restricted to, same as os.cpu_count().

return:

1 if successfully expanded

get(self, input_key) list

get the target object from shared_dict with input_key

param input_key:

the target key value

return:

list in form of [Boolean, object]. The first boolean value indicates whether the input_key is in the dict. The second python object is the target object if the first boolean value is True.

insert(self, input_key, input_value) int

insert the shared_dict with input_key and input_value

param input_key:

the target key value

param input_value:

python object to insert

return:

1 if successful

remove(self, input_key) int

remove the target object from shared_dict with input_key

param input_key:

the target key value

return:

1 if successfully removed, 0 if the input_key not in the target shared_dict