jump to navigation

Boot OpenSolaris into Single User Mode June 12, 2008

Posted by anonanon in opensolaris.
add a comment

Update: I thought I had left this in a draft state – my main goal in writing it was to test WordPress and ScribeFire together. Unluckily ScribeFire wasn’t working too well for me. Luckily, The Observatory picked up where I left off.  Check it out for working screenshots.

I was looking through my google analytics stats and realized that I’ve had a ton of hits from people trying to figure out how to boot OpenSolaris into single-user mode. This post should do the trick for you if you are on x86.

As you are booting the system, you will see a menu that looks something like:

Use the ‘e’ key to edit the command-line, then the down arrow to get to the line highlighted.

Use the ‘e’ key to edit this line. Add ‘-s’ to the end, as shown below.

Hit the Enter key and you are taken back to this screen. Notice that the -s appears on the line you just edited.

Now just use ‘b’ to boot into single-user mode.


Solaris Wish List: Make it Open Source March 22, 2008

Posted by anonanon in opensolaris, solaris, wish list.
add a comment

I frequently see references from Sun and those that quote some Sun people as saying that Solaris is Open Source.

What is Solaris?

There are a few different flavors of things that people may refer to as Solaris:

  • Solaris 10 and earlier versions. This is what most references to the word Solaris seem to be referring to. Each release of Solaris is supported for a number of years through commercial and no-cost channels, to varying degrees.
  • Solaris Express. This is a collection of software that is viewed as a fairly stable distribution based upon the development branch of Solaris. Very limited (“installation and configuration as well as for developer assistance”) is available, but the support period seems to be limited to around 3 months per release.
  • OpenSolaris. This is best summarized as “The main difference between the OpenSolaris project and the Solaris Operating System is that the OpenSolaris project does not provide an end-user product or complete distribution. Instead it is an open source code base, build tools necessary for developing with the code, and an infrastructure for communicating and sharing related information. Support for the code will be provided by the community; Sun offers no formal support for the OpenSolaris product in either source or binary form.”

What is Open Source?

The annotated Open Source Definition covers this quite well.

Is Solaris Open Source?

Let’s see if the license used by Solaris aligns with the Open Source definition.

Free redistribution No. The Software License Agreement states: You may make a single archival copy of Software, but otherwise may not copy, modify, or distribute Software.
Source Code No. The source code for Solaris is not available. Note that while a bunch of code is available at src.opensolaris.org, this is not the same source code that is used for building Solaris. If I want or need to modify the behavior of Solaris, there is no straight-forward way to do so.
Derived Works No. Since I may not copy, modify, or distribute Solaris, this point is moot.
Integrity of Author’s Source Code No. Since the source code is not available, this point is also moot.
No Discrimination Against Persons or Groups I think so. See section 11 of the license agreement for export restrictions.
No Discrimination Against Fields of Endeavor I think so. While Sun’s lawyers don’t want to suggest that running your nuclear power plant with Solaris, they don’t say that you can’t.
Distribution of license No. Since redistribution is not allowed, this point is moot.

Since one or more of the requirements to be called Open Source are not met by the license under which Solaris is distributed, Solaris is not open source.

People that need support don’t want source code.

That is great in theory, but in practice it falls down a bit. Let’s pretend that you need to write a dtrace script to dig into a thorny performance problem. If the stable dtrace providers don’t provide probes at the right spot, you need to fall back on fbt or pid probes. The only way to understand what these are tracing is to read the source. Since the OpenSolaris and Solaris branch point is now about 3 years old, this is becoming extremely difficult to do reliably. The code that you get by browsing the latest OpenSolaris code sometimes does not match up with the fbt probes that are available in any release of Solaris. You may have more luck looking at historical versions of the source code, but that is a guessing game at best.

There are also times when a customer’s needs just do not align with what Sun is willing to offer. Suppose you need different functionality from a device driver. It is possible that it is a trivial change from the customer’s standpoint, so long as they have current source code. However, if the source code is not available, the best the customer can do is grab the same driver from somewhere else (e.g. opensolaris.org) and try to to maintain a special version of the driver and provide custom ports of all the bug fixes that they would otherwise get from from the Solaris source.

Suppose that hypothetical fix that the customer needed was something that Sun agreed was needed but they did not have the time to develop the fix. If the Solaris source code were as open as the OpenSolaris source code, the customer could work with the OpenSolaris community to get the fix integrated into OpenSolaris, then provide the backport to Solaris. If the customer could do this, the code would see the appropriate code reviews, development branch burn-in, etc. with minimal additional workload on Sun.

My Wish

I know that Sun went through tremendous work to make OpenSolaris happen. They should be commended for that and the other tens (hundreds?) of millions of lines of code they have opened up or written from scratch in the open. This gives them tremendous opportunity with Solaris 11 (assuming that 10+1 = 11). Keep the code that is open, well, open. When patches or other updates are released, be sure that it is clear in the open source code repository which files are used in building that update. To get the full benefit of this, it should be possible for the Solaris customer to set up a build environment to build this source.

It is OK if Sun doesn’t want to support the code I modify. However, I would expect that they would support the unmodified parts of Solaris much the same as they do if I install a third-party package that adds some device drivers and mucks with some files in /etc.

OpenSolaris Wish List: Feature-based meta packages March 16, 2008

Posted by anonanon in installation, opensolaris, packaging, solaris.
add a comment
The problem

There are many occasions when an administrator needs to install a particular feature on a system but tracking down the packages, patches, or revisions thereof is next to impossible. Consider the following scenarios:

  • A sysadmin is trying to build minimized systems that have a firewall, bash, python, and can support zones using Jumpstart. What needs to appear in the Jumpstart profile?
  • That same sysadmin then decides that a particular installed system needs to be augmented to support the OSPF routing protocol.
  • Some time passes and an updated feature of is released. The sysadmin needs to be able to update systems to support that feature without guessing which packages or patches are required. Note that the a subset of the required packages may not have been installed initially because they did not exist initially or they weren’t required for to meet the dependencies of the earlier feature set.
  • A sysadmin notices a nifty feature mentioned in the “What’s New” documentation and wants to determine if a recently patched system system supports the feature.

Dealing with any of these situations is very difficult today.

Most commonly, the best advice that is offered when trying to indicate what is required to have a particular feature is to say that the feature is available in Solaris X update Y. While that is true to a certain extent, that is not the complete story. It assumes that the system in question is in no way minimized and ignores the fact that the same patches that are integrated in update Y are also available for installation on releases before update Y.

Ongoing Work

The Image Packaging System looks to be outstanding work in this direction. The one-pager is likely the best place to look for a quick overview. While this will provide a good foundation, there are numerous posts to mailing lists of the form “I’m trying to install X, but pkg can’t find it.”

Suggested Approach

Some geeks (power users, whatever you would like to call them) will know the exact name of the packages that they want or will otherwise be able to search for them rather adeptly. Having the ability to say “I want bash” rather than “I want the same shell used by most Linux distros” is important.

I suspect that most people will be quite happy looking at their requirements and matching them up to advertised features. I suggest that the feature/* package namespace be reserved for meta-packages that represent features. A feature meta-package should not deliver any files, rather it just defines dependencies for the minimal set of packages required to support the feature. See the Package FMRIs and Versions section of pkg(5) for details on package naming.

For instance, if I needed the zones feature I should be able to say something along the lines of “pkg install feature/zones”. This would likely correspond to an FMRI of pkg://opensolaris.org/feature/zones@5.11,5.11-7:20080326T164523Z. When new features are updated, the build number (7 in the example FMRI) should be updated.

While searching the published packages would yield a pretty good list of packages required for supporting zones, someone needing the OSPF routing protocol may have a harder time with a search. However, installing feature/network/routing/ospf would likely know that the zebra or quagga package is needed (varying by Solaris release).

Integration with Documentation

With such an approach, the Solaris What’s New documentation could be very helpful to sysadmins needing to update existing systems, installation profiles, etc. Consider the following, based upon a particular entry from the Solaris 10 What’s New documentation.

IPsec Tunnel Reform

Solaris now implements IPsec Tunnel Mode per RFC 2401. Inner-packet selectors can be specified on a per-tunnel-interface basis using the new “tunnel” keyword of ipsecconf(1M). IKE and PF_KEY handle Tunnel Mode identities for Phase 2/Quick Mode. Interoperability with other IPsec implementations is greatly increased.

For more information, see Transport and Tunnel Modes in IPsec in System Administration Guide: IP Services. This feature is available via the feature/network/protocol/rfc2401software collection.

To help someone with this feature find documentation of this type in the future, a documentation system (installed on system or internet accessible) should be able to point a person interested in documentation related to a feature to the appropriate man page(s), online books, and other documentation.

Similarly, if the feature is just an update of an existing feature, having the documentation refer to a particular build (release, whatever works) of the feature package would be must useful.

Future of OpenSolaris Boot Environment management March 12, 2008

Posted by anonanon in Uncategorized.
add a comment

I was quite happy to see this recent post from Ethan Quach proposing an efficient method for sharing the variable parts of /var. It bears a striking resemblance to something that I suggested and and clarified in the past.

But why does this matter? When you are making significant changes to the system, such as during a periodic patch cycle or upgrade, it is generally desirable to…

  1. be able to do so without taking the system down for the duration of the process
  2. be able to abort the operation if you have a change of heart
  3. be able to fail back if you realize that newer isn’t better

Consider what is in /var:

  • Mail boxes If the machine is a mail server (using sendmail et. al.) there is a pretty good chance that users have their active mail boxes at /var/mail.
  • In flight mail messages Most machines process some email. For example, if a cron job generates output it is sent to the user via email. Many non-web mail clients invoke /bin/mail or /usr/lib/sendmail to cause mail to be sent. Each message spends somewhere between a few milliseconds and a few days in /var/spool/mqueue or /var/spool/clientmqueue.
  • Print jobs If the machine acts as a print server (even for a directly attached printer) each print job spends a bit of time in /var/spool/lp.
  • Logs When something goes wrong, it is often times useful to look in log messages to figure out why it went wrong. Those are often found under /var/adm.
  • Temporary files that may not be It is rather common for people to stick stuff in /var/tmp and expect to be able to find it sometime in the future.
  • DHCP If a machine is a dhcp server, it will store configuration and/or state information in /var/dhcp.

All of those things should be of a stable file format and usable before and after you patch/upgrade/whatever. If you take the traditional Live Upgrade approach, you can patch or upgrade to an alternate boot environment. As part of activating the new environment, a bunch of files are copied between boot environments. According to this page the following things are synchronized:

/var/mail                    OVERWRITE
/var/spool/mqueue            OVERWRITE
/var/spool/cron/crontabs     OVERWRITE
/var/dhcp                    OVERWRITE
/etc/passwd                  OVERWRITE
/etc/shadow                  OVERWRITE
/etc/opasswd                 OVERWRITE
/etc/oshadow                 OVERWRITE
/etc/group                   OVERWRITE
/etc/pwhist                  OVERWRITE
/etc/default/passwd          OVERWRITE
/etc/dfs                     OVERWRITE
/var/log/syslog              APPEND
/var/adm/messages            APPEND

Notice that the default configuration loses your in flight print jobs because /var/spool/lp is not copied. Suppose you have a mail server with a few gigs of mail at /var/mail. Is it a good use of time or disk space to copy /var/mail between boot environments?

A much better solution seems to be to make those directories shared between the boot
environments. The way to do this in Live Upgrade and presumably in the future is to remove (or not add) them to /etc/lu/synclist and allocate separate file systems. However, do you really want a file system for /, /var/mail, /var/spool/mqueue, /var/spool/clientmqueue, /var/spool/lp, /var/adm, /var/tmp, /var/dhcp, …? What if you had someone tell you that you had to monitor every file system on every machine for being out of space? How big would you make all of those file systems so that your monitoring didn’t wake you up in the middle of the night?

In the future, it looks as though OpenSolaris will use ZFS to store each boot environment. Among the features of ZFS that make this desirable are snapshots, clones, and rethinking the boundary between disk slices (or volumes) and file systems. If the organization of /var is changed just a bit…

/var/adm -> share/adm
/var/dhcp -> share/dhcp
/var/mail -> share/mail
/var/spool -> share/spool
/var/tmp -> share/tmp

Then you can get by with having two zfs file systems: / and /var/share. The Snap Upgrade process would then likely do the following:

  1. Take a snapshot of /, clone it, then mount it somewhere usable in subsequent steps (e.g. /mnt/abe)
  2. Do whatever is needed on the alternate boot environment mounted at /mnt/abe.
  3. Unmount the alternate boot environment

When it comes time to activate the new boot environment, there are some files that are likely need to be synchronized using the traditional mechanism. For instance, if someone tried to get into a system by guessing a user’s password, there is a reasonable chance that the account was locked via a modification to /etc/shadow. Presumably you don’t want to give the bad guy another chance when you activate the new boot environment. Note, however that the files that may need to be synchronized in /etc are nearly always small files and there would not be very many of them. The files in /var/shared would not need to be synchronized. However, just in case the new version of sendmail decides to eat mailboxes, it would be very nice to be able to recover.

This means that activating a boot environment would look like:

  1. Bring the system into single-user mode
  2. Mount the alternate boot environment
  3. Synchronize those files that need to be synchronized
  4. Take a snapshot of /var/shared
  5. Set the boot loader to boot from the new boot environment and offer a failback option to the old boot environment
  6. Reboot

The items in italics are special to boot environment activation. Each one should take a couple seconds or less – adding far less than thirty seconds to the normal reboot process to activate the new boot environment. Failback would be similarly quick.

Now suppose this system is a bit more complicated and has 20 zones on it. Have you ever patched a system with 20 zones on it? Did you start and Friday and finish on Monday? How happy were the users with the “must install in single-user mode” requirement? This same technique should allow you to have two file systems per non-global zone – one for the zone root and one for /var/shared in the zone. Supposing that the reboot processing takes 5 seconds per zone you are looking at an extra minute to reboot rather than a weekend of down time.

Without Live Upgrade or Snap Upgrade, what would backout look like? After you had the system down for patching for a couple days, you could take it down again for a couple days to back the patches out. Or you could go to tape. Neither is an attractive option. With Snap Upgrade you should be able to fail back with your normal reboot time plus a minute.

Today’s dtrace one-liners April 21, 2007

Posted by anonanon in dtrace, solaris.
add a comment

Well, I’ve been spending a bit more time with dtrace lately. I’m now starting to do interesting things with dtrace without looking at the manual or snagging other scripts. Today I found some processes that looked like they weren’t doing much of use. In particular, I saw the following in “prstat -mL”

 15274 oracle    46  53 0.0 0.0 0.0 0.0 0.0 0.0   0  67 .22   0 racgmain/1
 15295 oracle    47  53 0.0 0.0 0.0 0.0 0.0 0.0   0  74 .23   0 racgmain/1
 16810 oracle    47  53 0.0 0.0 0.0 0.0 0.0 0.0   0  66 .23   0 racgmain/1
 25029 oracle    47  53 0.0 0.0 0.0 0.0 0.0 0.0   0  70 .23   0 racgmain/1

First, it is confusing that it says that these processes are doing less than one system call per second (.22 or .23), but spending more than 50% in system calls. Looks like a bug in prstat. To see what system call was being used…

# dtrace -n 'syscall:::entry/execname == "racgmain"/ { @[probefunc] = count() }
dtrace: description 'syscall:::entry' matched 232 probes

  read                                                         269039

Ahh, so it is only reads that are causing all of this. Let’s see which file descriptors are involved.

# dtrace -n 'syscall::read:entry/execname == "racgmain"/{ @[pid, probefunc,arg0] = count() }'
    22469  read      9                7
    22469  read      4                7
    22469  read      3                9
    22469  read      8                9
    22469  read     10               10
    22469  read      7               21
    22469  read      6               24
    15274  read     10          1026462
    15295  read     10          1044932
    25029  read     10          1051450
    16810  read     11          1060720

It seemed rather odd that there would be so much reading on a single file descriptor per file (10 or 11, depending on the process). If this were really the case, I would expect to see some processes showing up in prstat that were high in %sys doing writes. Using pfiles on each of the pids showed that the file descriptor in question was a FIFO, but gave no indication what (if anything) was on the other end of the FIFO.

To help answer the question of what might be on the other end of the FIFO, I tried this (ok, not a one liner…)

#!/usr/sbin/dtrace -qs 

/execname == "racgmain"/
        @c[pid, arg0] = count()

        printf("%6s %6s %4s\n", " pid  ", " size ", "count");
        printa("%6d %6d %@4d\n", @c);

Which gave the following output…

 pid    size  count
 15295      0 117524
 15274      0 121065
 25029      0 121308
 16810      0 125358
 pid    size  count
 15274      0 108177
 25029      0 109639
 15295      0 115735
 16810      0 117175
 pid    size  count
 15295      0 119411
 15274      0 119471
 25029      0 120273
 16810      0 120650

Imagine that… these processes are each doing 100,000+ zero-byte reads per second. Time to tell the DBA’s that they have some real problems with one of their daemons.

reproducible hang with ldom preview April 12, 2007

Posted by anonanon in Uncategorized.
add a comment

This blog entry is because formatting is horribly broken at http://forum.java.sun.com/thread.jspa?threadID=5159568 where I originally posted it.

I configured a T2000 as described in the beginner’s guide (http://www.sun.com/blueprints/0207/820-0832.pdf) with the exception of the device allocated for the root disk. For that I came up with my own variant of http://unixconsole.blogspot.com/2007/04/time-to-build-guest-domain.html.

My variant of using a file involved creating the file with mkfile on a zfs file system. That is…

zpool create zfs mirror c1t0d0s4 c1t1d0s4
zfs create zfs/ldoms
zfs set compress=on zfs/ldoms
mkfile 32G /zfs/ldoms/root.img

As I install Solaris in the ldom, the server (control domain) dies after extracting a few hundred megabytes of a flash archive. I have traced this down to it running out of memory.

Here’s “vmstat 4” output on the control domain console:

0 0 0 8636536 24776  1  70  0  0  0  0  0  0  0  0  0 1954  311 2244  0 23 76
0 0 0 8645096 22280  1  36  0  0  0  0  0  0  0  0  0 5957  370 9827  0 19 80
0 0 0 8649008 17720  1  40  0 296 313 0 44 0  0  0  0 9975  361 15877 0 24 75
0 0 0 8651104 15944  1  52  0 807 1671 0 700 0 0 0  0 10725 347 17545 0 26 74
0 0 0 8650800 18376  0  60  0 88 239 0 127 0  0  0  0 9816  391 15545 1 33 67
0 0 0 8640432 15936  0  76  0 497 3025 0 3874 0 0 0 0 11367 432 17975 0 35 65
0 0 0 8642968 17032  1  59  0 452 2028 0 842 0 0 0  0 10266 363 16127 0 27 73
kthr      memory            page            disk          faults      cpu
r b w   swap  free  re  mf pi po fr de sr m0 m1 m2 m1   in   sy   cs us sy id
0 0 0 8644768 15744  0  56  0 387 1298 0 126 0 0 0  0 10170 330 16355 0 24 75
0 0 0 8652504 18368  1 113  0 372 2462 0 273 0 0 0  0 11171 321 18613 0 35 65
0 0 0 8652832 15720  1 134  0 411 6081 0 738 0 0 0  0 11541 332 18979 0 34 66
0 0 0 8652232 14312  1  94  0 413 1806 0 7775 0 0 0 0 10718 358 18271 0 38 62
0 0 0 8647360 12592 18 133  9 555 5176 0 17490 1 0 1 0 10394 320 16970 1 37 63
0 0 0 8645248 14408  2  73 22 486 5039 0 3111 2 1 1 0 11749 383 18336 0 40 59
2 0 43 8641800 2784  1 148 99 1070 1517 0 53982 19 9 9 5 8316 356 14226 0 43 57
0 0 116 8647032 800  1  42 127 134 312 3688 76207 14 7 7 1 2153 114 3726 0 29 71

At this point the server froze. Note that 116 processes were swapped and the “de” column is 3688. Very bad news.

My initial thoughts were that I was running into some of the low-memory problems known to happen with the ZFS arc. This does not seem to be the case. According to mdb, the arc size is around 60 MB:

# mdb unix.3 vmcore.3
> arc::print -td size
uint64_t size = 0t61455360

The control domain is S10 11/06 + 118833-36 + those required for ldoms + many others. The ldom is in the process of being installed is booted from a S10 11/06 netinstall image (118833-33).

Random password one-liner September 1, 2006

Posted by anonanon in Uncategorized.

I recently came up with this method for generating reasonable random 8-character passwords:

$ dd if=/dev/random bs=6 count=1 2>/dev/null | openssl base64

If 8 characters is not long enough, increase the number after bs= to 75% of the number of characters you would like in the password.

Install Solaris from DVD image on disk August 27, 2006

Posted by anonanon in Uncategorized.
add a comment

My personal SPARC machine is pathetic by today’s standards – An Ultra II with a pair of 300 MHz processors, 768 MB RAM, and a very slow CDROM drive. This is pretty much the slowest machine that is supported by Solaris 10. That, and today I decided it was time to get a fresh installation of Solaris Express (build 46) on it.

I first tried the live upgrade route. However, that didn’t work out too well because I had previously used bfu to get some newer OpenSolaris bits on the machine. I really did not want to repeat the download process for all the CD ISO’s (already had downloaded the DVD ISO). Now, if you think that downloading and burning is slow – you should see the speed of the installation on this CDROM drive. It was probably OK in the days when Solaris fit on one CD, but not today with 5(?) CD’s to complete the installation.

The disk layout of the machine was as follows:

  • c0t0d0 32 GB disk
    • c0t0d0s0 – 4.5 GB available for new /
    • c0t0d0s1 – ~500 MB swap
    • c0t0d0s7 – remainder as zfs pool “pool0”
  • c0t1d0 4 GB disk
    • c0t1d0s0 – Root with build 36 (?) + random BFU bits

I had the DVD image in a subdirectory of my home directory that was in the pool0/home file system in the zfs pool.

To make use of that DVD image without buying a SCSI DVD drive, I did the following:

  1. Burn build 46 CD0 to a CD-R
  2. Boot from the CD-R
  3. Go clean up the shop from the woodworking I was doing earlier
  4. Do some laundry
  5. Return to the Ultra II to find that it was just about to ask me which language I speak. Really, it was still working on it. Now do you know why I didn’t want to feed it 5 CD’s?
  6. Answer sysidcfg questions
  7. Exit the installer
  8. zpool import pool0. After the import was complete but before mounting file systems, zpool crashed with a segv. Later I saved that core file to /a for later analysis
  9. zfs set mountpoint=/tmp/home pool0/home
  10. zfs mount pool0/home
  11. lofiadm -a /tmp/home/build46.iso
  12. umount /cdrom
  13. mount -F hsfs -o ro /dev/lofi/1 /cdrom
  14. install-solaris
  15. Go blog about a cool hack. 🙂

The installation is now about 40% done. Looks like the hack is working just fine. I wonder if I could bundle this all up in a begin script (especially the laundry) to automate the installation from an ISO image after booting from local media.

Update on zoneadm create with zfs March 1, 2006

Posted by anonanon in Uncategorized.
1 comment so far

After reaching out to Sun to work on getting my work integrated into OpenSolaris, I found that Sun was already working on this feature. Subsequently, they indicated that the code made it into some internal source code tree. As such, I am holding off on future development until I can get at that code.

However, if you are wanting to try it out, I have posted the code for others to play with. If you have a working OpenSolaris build environment, you should be able to drop in my modified zoneadm.c, run dmake all, then use the resulting zoneadm command. Alternatively, the sparc version of the zoneadm binary is also available.



Zone created in 0.922 seconds February 20, 2006

Posted by anonanon in Uncategorized.
add a comment

I noticed today that in the latest OpenSolaris code that “zoneadm clone” exists. Unfortunately, cloning a zone only offered the copy mechanism that was essentially “find | cpio”. A bit of hacking later and we have this:

# time ksh -x /var/tmp/clone
+ newzone=fast
+ template=template
+ zoneadm=/ws/usr/src/cmd/zoneadm/zoneadm
+ PATH=/usr/bin:/usr/sbin
+ zonecfg -z fast create -t template
+ zonecfg -z fast set zonepath=/zones/fast
+ /ws/usr/src/cmd/zoneadm/zoneadm -z fast clone -m zfsclone template
Cloning zonepath /zones/template...

real    0m0.922s
user    0m0.128s
sys     0m0.171s

This comes is achieved using zfs to create a snapshot of the template zone, then clone the snapshot to create the zonepath of the new zone. A bit of cleanup is needed, but goodness is on the way.