Loading pathlists and changelogs

Anders F Björklund afb at algonet.se
Sun Sep 28 05:36:53 PDT 2008


Currently pathlist and changelog are not implemented
for non-installed packages (i.e. they just return [])

     def getPathList(self):
         return []

     def getChangeLog(self):
         return []


To make it available for e.g. the rpm-md and apt-deb
loaders too, there needs to be some kind of mechanism
to download the extra metadata when it is required.
i.e. not only read from cache, but trigger a fetch...

In the current implementation, "filelists.xml.gz" and
"other.xml.gz" are always downloaded with the repodata.
This increases the load time, as they are typically 10x
the "primary.xml.gz" metadata - and even if not needed.

It totally breaks down for the APT channels, however,
which store their pathlists and changelogs outside of
the repo data and even on a package-per-package basis.
So downloading it for all packages is not really doable.

Example:
http://changelogs.ubuntu.com/changelogs/pool/universe/s/smart/ 
smart_0.52-2/changelog
http://packages.ubuntu.com/hardy/i386/smartpm/filelist

So the first call to getChangeLog() or getPathList()
would trigger the loader to fetch the external data,
and then parse those files from the downloaded cache.
Just like it does with RPM db and DEB files* already ?

* : that would be "changelog.Debian.gz" and "*.info",
as in https://code.launchpad.net/~afb/smart/changelog
Upstream code for loading apt-deb repos can be found at:
https://code.launchpad.net/~glatzor/python-apt/consolidate


It's still quite slow to search for a particular file
or changelog date due to the linear structure, though.

It would need to be split by dirname/basename and by
name/time/text and return something other than a list.

--anders




More information about the Smart mailing list