Archive for May, 2010

Initial impressions on Fedora 13

As you already know, Fedora 13 is out! 😛 This time I decided to care a little more about what I do and what I don’t like after a fresh installation and write them down. Today was the first day that I was experiencing Fedora 13 (while I’ve checked Fedora 13 Beta and Test candidate briefly), so the list is not so long. Also, I do not talk about features which are highlighted enough in other places like Fedora 13 Feature List.

* Installation:

I really liked the look and feel of the installer. Like always, I should deselect the “system clock uses UTC” option during the installation. I think almost everybody using a dual boot system is forced to do it, as I always recommend everybody to do so; because I’ve seen frequently that some new users are wondering why their clock is screwed when they switch between Windows and Fedora. Now, I’m thinking if I should report it as a bug…

Another change is in the initial package selection page where you can select only one category rather than selecting multiple categories. I usually need office applications, internet applications and certainly the development applications. So I decided to go with the Software Development selection. Surprisingly in the package selection customization page I saw that many sections are not selected (e.g. Office), I wonder if they are brought in by a dependency but it’s not nice too come up with a system without any office or multimedia applications installed.

And as always, fedora eclipse is not selected by default too (so you won’t get any IDEs if you do not opt to customize package set). I don’t like it at all!

The final note about the installer: finally it correctly recognized my Windows partition rather than my recovery partition 😛

* Art work: I really like the new artwork! While there is still a long way to go…

* Package Management:

PackageKit now disables a repo if it cannot access it rather than trying to connect to it forever or just stopping with an error message. It seems that it is going to become a real option for me! However, if you are offline right after installation, you can’t get the list of installed packages at all (because the package manager does not have the group data). I would expect to see all installed packages in the “Others” group, but there were no success.  The only way I could get some info was by doing a search. Maybe a search with blank string would result in the complete list of installed packages, but I didn’t try that.Another problem is that if you change your gnome’s proxy settings, you should logoff and loggin so that the package manager will use the new settings. The user doesn’t care about backend/frontend stuff, it is not reasonable to be forced to do a logoff/login just because of a change in proxy settings…

* KDevelop 4:

One of the great things which have landed on time in Fedora 13 was the release of the first version of KDevelop 4. It is a complete rewrite of KDevelop (AFAIK) which took a long time and finally arrived recently. Unlike the previous version, this version comes with advanced C++ editing/parsing capabilities. While it still lacks some features, it provides a set of really interesting features. Long ago I switched from KDevelop to Eclipse, but I feel that I should evaluate KDevelop and some other IDEs (CodeBlocks, NetBeans, Anjuta, …) again to see if anyone can really beat Eclipse. Anyway, check the latest version of KDevelop to see if you like it!

Also, yum is apparently much faster in Fedora 13. I feel that it is more responsive and does the job faster. Very happy to see that!

OK, That’s enough for now!

My plans/Ideas for Yum! (part 1)

Well, there are some ideas about yum in my mind which I would like to implement but unfortunately I have not found enough time for it yet; and I would not start working on them at least for the next 2 months. However, I hope to be able to start working on them afterwards (even slowly). I’m writing them here, so that: 1. someone might decide to work on them! 2. I might receive some feedback/suggestions about my ideas to improve the design and fix its bugs.

For many users yum works great. But when it comes to low bandwidth internet connections, it is not so bright. The most annoying part in this regard is downloading repository metadata; which is downloaded completely in regular intervals. And until recently, sometimes yum was completely unable to download a repository’s metadata since the connection timed out during the download and yum would start from the beginning… fortunately this problem has been fixed.

Yum is really wasting bandwidth by downloading lots of metadata, most of which is not used at all. For some people, this is perfect as they prefer to waste bandwidth rather than some CPU cycles (like the ones who don’t like delta rpms); but if you don’t have a good bandwidth you won’t like that.

Considering that fact, I’ve decided to add a “low bandwidth” mode to yum so that users can select which mode they prefer. In this mode, a new kind of repository metadata will be used considering the following goals:

  • The metadata should be downloaded incrementally as much as possible. Try to avoid downloading a single piece of metadata more than once
  • Only the data which is needed should be downloaded
  • Yum should not become slower (noticeably)
  • No server side processing is acceptable: a repository is a set of files and directories. Just this. Any plain http/ftp server should be able to hold a repository.
  • Even if the bandwidth savings does not happen in all use cases, it still worth if it works on most common work flows (e.g. install/remove/update).
  • Security: transferred data should be verifiable.
  • Yum should be able to do its job (e.g. resolving dependencies)!

Currently, a repositories metadata is stored in a number of files, AFAIK 3 bigger ones are: primary, filelist and other (if available) databases stored in sqlite files which apparently provide a fast method for yum to query data. Among those, the primary database is always downloaded, others will be downloaded if the need arise. Currently, the primary db of F12 is around 12MB and the primary db of F12 updates is around 5MB.

OK, in this post I’ll consider the primary database only: this database contains this information: list of package files in some directories (e.g. bin), package information: name, summary, description, conflicts, obsoletes, provides, requires, and some other such information. But, does yum really need all this information about all packages to function? No. So, what I’d like to do is to split the metadata as far as possible (but not too much, will be described later) in a way that yum can avoid downloading data which it doesn’t need. Some people have said that using such methods might make yum much slower compared to using sqlite databases. But, there is no need to use the same format both on the server and the client side. My ideas are generally related to how the metadata is stored on the server side. Yum could integrate any downloaded information to its local sqlite databases. The yum cache in the client side will be the same format in both low bandwidth and high bandwidth modes. The databases should contain a flag so that yum can understand which data are available in its cache and which ones should be downloaded if needed. As a result, such metadata split will not decrees yum’s performance with regard to interacting with cached data (the only extra process is to check each needed data is available or not).

Now, let me be a bit more specific about how the primary database can be split: the minimum required information for yum is probably the list of the packages(package summaries might be included here too). For each package, its summary and description can be stored in a separate file (e.g. foo-1.0.0-1.fc12-description)*. We can have separate directories for each locale, so that localized package summary and description can be provided too. If a user issues a yum info command, the summary and description of the package is downloaded (if not downloaded before) and displayed to the user. But certainly, if the user wants to do a yum search, yum should have the summary and description of all packages. Well, in this case you’ll download summary and description of all packages**. It is still better than the current situation. Also notice that these information will never change for a specific package, so they will be downloaded only once.

Other package information (like its requirements) will be stored in a separate file too. Currently, package requirements take a considerable space in the primary repo (IIRC by removing the requirements information from the primary repo, its compressed size will become about half of its original size), but in common workflows, you only need a package’s requirements when you want to install that package. Again, this information will be downloaded once for each package.

When it comes to packages’ provides and file list information, splitting them become hard: when you want to satisfy a package’s requirements, you might face file or capability based dependencies. So, you should be able to figure out which package provides a specific capability or file. I’ll describe my ideas in this regard in another post, but for now you can assume that packages’ list of provided capabilities and files in specific directories (which are currently in primary db) are downloaded for all packages by yum. Even in this case, such lists will be downloaded once for each package (providing the incremental metadata downloading). When new packages are added, their information are downloaded too.

Seems to be enough for now! Just the two points:

* The above description might result in a large number of very small files. Considering that each file should be signed, it might result in a considerable overhead. But it is not really needed to put each of those information in a separate file: instead of putting the information of each package in a separate file, we can put the data of a number of packages (for example each 10 packages, or each ~100KB chunk of data) in a single file (The file name will be included with the package list that yum downloads initially).

** Do we really need to do the search exclusively locally?! No. It’s true that we do not want our mirrors to do any server side processing, but:

1. The server side search feature can be provided by a few number of Fedora infrastructure servers

2. Even better (?!), a search engine like google will do this for us! When a user issues a “yum search” command, yum can at first search its local database, and then instead of downloading all package descriptions it can use a search engine and point it to a repoview (or a new plain html format for package descriptions more suitable for this kind of search) url in a mirror, and show the results to the end user. So, you’ll get server side processing using google’s resources!

Wow! Much longer than what I intended 😛