From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: location@lists.ipfire.org Subject: C library and database design Date: Fri, 08 Dec 2017 15:56:43 +0000 Message-ID: <1512748603.1472.120.camel@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5738487533528681058==" List-Id: --===============5738487533528681058== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, I have had a little bit of a head start and implemented a few things that I wanted to present here: I built a C library that is supposed to implement the core functionality or reading and writing the database as well as performing the lookups. https://git.ipfire.org/?p=3Dpeople/ms/libloc.git;a=3Dsummary So far there is an implementation of a string pool which will later hold all strings like the names of countries and ASes, etc. The pool will probably at = the end of the database and keeps all strings separated by a NULL byte. Therefore= it is easy to just jump to the right place and read the string from there until = you find NULL. That makes a string lookup perform in O(1) no matter how large the database is. When you add a string to the pool it is checking for duplicates so that adding the same string twice will make the pool only store it once and return the sa= me address both times. This operation is O(n) but that should be fine since the database is not going to be very very large (i.e. gigabytes) and we write it once and read if very often which makes this a good optimisation. We could potentially compress the string pool later if we needed to, but so f= ar I didn't see the point there and it is kind of nice to being able to open the database in a hex editor and see what is going on inside. So what does the database look like to far? [root(a)rice-oxley libloc]# hexdump -C test.db 00000000 4c 4f 43 44 42 58 58 00 00 00 00 00 00 00 00 0c |LOCDBXX.........| 00000010 00 00 00 18 00 00 01 55 54 65 73 74 20 56 65 6e |.......UTest Ven| 00000020 64 6f 72 00 4c 6f 72 65 6d 20 69 70 73 75 6d 20 |dor.Lorem ipsum | 00000030 64 6f 6c 6f 72 20 73 69 74 20 61 6d 65 74 2c 20 |dolor sit amet, | 00000040 63 6f 6e 73 65 63 74 65 74 75 72 20 61 64 69 70 |consectetur adip| 00000050 69 73 63 69 6e 67 20 65 6c 69 74 2e 20 50 72 6f |iscing elit. Pro| 00000060 69 6e 20 75 6c 74 72 69 63 65 73 20 70 75 6c 76 |in ultrices pulv| 00000070 69 6e 61 72 20 64 6f 6c 6f 72 2c 20 65 74 20 73 |inar dolor, et s| 00000080 6f 6c 6c 69 63 69 74 75 64 69 6e 20 65 72 6f 73 |ollicitudin eros| 00000090 20 75 6c 74 72 69 63 69 65 73 20 76 69 74 61 65 | ultricies vitae| 000000a0 2e 20 4e 61 6d 20 69 6e 20 76 6f 6c 75 74 70 61 |. Nam in volutpa| 000000b0 74 20 6c 69 62 65 72 6f 2e 20 4e 75 6c 6c 61 20 |t libero. Nulla | 000000c0 66 61 63 69 6c 69 73 69 2e 20 50 65 6c 6c 65 6e |facilisi. Pellen| 000000d0 74 65 73 71 75 65 20 74 65 6d 70 6f 72 20 66 65 |tesque tempor fe| 000000e0 6c 69 73 20 65 6e 69 6d 2e 20 49 6e 74 65 67 65 |lis enim. Intege| 000000f0 72 20 63 6f 6e 67 75 65 20 6e 69 73 69 20 69 6e |r congue nisi in| 00000100 20 6d 61 78 69 6d 75 73 20 70 72 65 74 69 75 6d | maximus pretium| 00000110 2e 20 50 65 6c 6c 65 6e 74 65 73 71 75 65 20 65 |. Pellentesque e| 00000120 74 20 74 75 72 70 69 73 20 65 6c 65 6d 65 6e 74 |t turpis element| 00000130 75 6d 2c 20 6c 75 63 74 75 73 20 6d 69 20 61 74 |um, luctus mi at| 00000140 2c 20 69 6e 74 65 72 64 75 6d 20 65 72 61 74 2e |, interdum erat.| 00000150 20 4d 61 65 63 65 6e 61 73 20 75 74 20 76 65 6e | Maecenas ut ven| 00000160 65 6e 61 74 69 73 20 6e 75 6e 63 2e 00 |enatis nunc..| The database starts with a magic value which I have set to LOCDBXX at the mom= ent and we need to find a better one probably. After that there is a version field that we change if the format changes. So we can have any changes of the datab= ase format later if we need to. Then, there is a pointer to a string of the "vendor" of the database. That allows us to set where the database is from and there is also a "description" pointer where we can just write some text and put useful information about the database. The pointers just point to a string in the string pool. Do we need more like those? I could imaging license and URL where this databa= se came from? But we could put this into the description, too. Thoughts? Then there is an offset where the string pool starts in the file (i.e. at whi= ch byte) and the length of the string pool. There is no trailer yet to make sure that the database hasn't been altered. I guess we should do that with some sort of a signature inside the database. Bu= t I haven't really thought about that too much. I want to involve everyone into t= he process of designing this and also peer-review as much as possible so that we really have a smart and flexible design that won't create any future problems. The library will get some bindings for other programming languages later. That allows us to use it easily in Python, Perl and what else people use. I have g= ood experience with writing the Python bindings. We will need perl for using this database in IPFire 2. Anything else should be contributed by third parties. The library itself is under LGPLv2.1 or later now which allows closed source applications to use it as well. It has some unit tests to load the database, etc. So far I am only depending on the C standard library and it should stay as th= at as long as we can because that makes porting the database to other OSes easie= r. If we want to use compression we might need other libraries and of course the language bindings need that their respective libraries. So this is where I am now. I haven't really made a plan for the next steps. So please send me comments and suggestions and I will try to draft out something. Best, -Michael --===============5738487533528681058== Content-Type: application/pgp-signature Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="signature.asc" MIME-Version: 1.0 LS0tLS1CRUdJTiBQR1AgU0lHTkFUVVJFLS0tLS0KCmlRSXpCQUFCQ2dBZEZpRUU1L3JXNWwzR0dl Mnlwa3R4Z0hudy8yK1FDUWNGQWxvcXRqc0FDZ2tRZ0hudy8yK1EKQ1FkK25RLy9mRFgvcDMzdUg1 Y3JqT2xzaWcraVIyNE9Pc0s0ZzVrWTJ5c2I4YWNJb081YlFaL2lXeWZLdUtuLwpheU9nMlRVdXZh aCtOU1QrUGw5UVIzSzgzemRHOFBKeEc4ZFlsOEpCSmxGOVkrUzgwNkV1TEpPb3ZiYk9adWlpClpl Rmh0MnplNVpQM24rNHhoREJDeDBqK1JzbkExNklaT1Qxa1VwVGx1dVpRQUgrVGw1dmV2NkdaM2l5 WnNqR28KU3FWSWxZYTNudXZsUkgxVUpYYTRQKzNkMVgxZGRRdFZEM3NjcSs3UHBLTFFTQ0JydGM3 THkxaklJelk4NWx4VQpxVVozVzFSRHF6QmIySUkrdlkrUi9adTR4ZEtJbCt5UURhRjlmZ0FPZTVH cXVXQ1NuRzliQzBnUExnVmhmSUEyCmFwL0Vzb0hFU2h3eWN5d2JiZk85SUlScmhhQ2E5UDl0dko2 eDZHVnUvcy9tU2wxU3NhTzZDSllMMXJaNnJrZ1QKTVY1NWdYWnJRUVorc1dMcDNZdE93bVBwQUxx cUpqcWdkOHU4cFdxWGc4eHB2OFI1T3lBVDlIRVJrSDVXRHN5ZgoxbkVXWnNrNlhQd3lleVJ4T3NH ZFdKYVpPWXBjNWd0WTArYkVsTDV1dUQ0SEUvR1Z4UFlEb2pLZG1MMXdQdThOCnc2U2hWUFIxY1B2 ZEFZREhNKzlqRjJQbmUxOXNtbzBUdE5ROTZacUtMM0tjYmhrWkNrZUR5Q043UzhPcEVYZW4KL3h1 dzg4VG9xSVBsbnd1Q2dUWm1PZTl5RHJhMy9rbWFuL3hhYnlZSVlSMmxVOFUrbzdEbmIzcTc2bjZX d0pyNQpIakQwdFVFM3ZINTlKWnRzd21CT01uNVZQRElabWJzbWRSTXA2R2ZMTkdrSk81VEwwQ1k9 Cj1MUkNQCi0tLS0tRU5EIFBHUCBTSUdOQVRVUkUtLS0tLQo= --===============5738487533528681058==--