metadata cache
Anders F Björklund
afb at algonet.se
Sun Feb 21 04:34:15 PST 2010
>> Would it be possible to 'split' the cache into two files, e.g.
>> essential info and additional info. The essential info would
>> basically be the current metadata cache but only holding what is
>> absolutely necessary and the additional info would be looked up
>> only when needed (like description, group and such like). The idea
>> being that this would reduce the size of the main cache that is used.
>
>
> It would be possible to do a separate SQL index, that would map
> "pkgKey" into "file offset".
Meant to write "pkgId", not "pkgKey": (the key is internal to the
sqlite database, not external)
CREATE TABLE packages ( pkgKey INTEGER PRIMARY KEY, pkgId TEXT,
name TEXT, arch TEXT, version TEXT, epoch TEXT, release TEXT,
summary TEXT, description TEXT, url TEXT, time_file INTEGER,
time_build INTEGER, rpm_license TEXT, rpm_vendor TEXT, rpm_group
TEXT, rpm_buildhost TEXT, rpm_sourcerpm TEXT, rpm_header_start
INTEGER, rpm_header_end INTEGER, rpm_packager TEXT, size_package
INTEGER, size_installed INTEGER, size_archive INTEGER,
location_href TEXT, location_base TEXT, checksum_type TEXT);
Addding an index to the already existing xml also saves having to
duplicate all the information.
> Just that ElementTree doesn't help much with this, so it would need
> a separate indexing run.
Added an (pyexpat) index creator in http://bazaar.launchpad.net/~afb/
smart/metadata/revision/941
If you run it on each repodata file, index looks like:
# tests/data/rpm/repodata/primary.xml.gz
781a4605a429eb27846f0234657f84f1a5831696 156
b70ad189a33ba47c50f368475458b0fc19630f5f 1981
# tests/data/rpm/repodata/filelists.xml.gz
781a4605a429eb27846f0234657f84f1a5831696 113
b70ad189a33ba47c50f368475458b0fc19630f5f 289
# tests/data/rpm/repodata/other.xml.gz
781a4605a429eb27846f0234657f84f1a5831696 109
b70ad189a33ba47c50f368475458b0fc19630f5f 259
Where the number is the byte offset to the <package>.
So now it doesn't need to scan the entire file, but it can seek
directly to the element start...
--anders
More information about the Smart
mailing list