Bug 9508 - Captive portal test only run during catalog refresh
: Captive portal test only run during catalog refresh
Status: RESOLVED FIXINSOURCE
Product: pkg
transport
: unspecified
: ANY/Generic All
: P2 major (vote)
: ---
Assigned To: johansen
: pkg/transport watcher
:
:
:
:
:
  Show dependency treegraph
 
Reported: 2009-06-17 17:37 UTC by johansen
Modified: 2009-07-01 16:39 UTC (History)
1 user (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description johansen 2009-06-17 17:37:23 UTC
The captive portal test serves two different purposes.

1. It ensures that if the client is on a captive network, the pkg system
doesn't inadvertently attempt a download.

2. It catches systemic problems with the client's network configuration, should
it be unable to contact any valid host.

The test used to be run prior to any image-modifying operation.  However, now
it's only invoked prior to a catalog update, which happens only occasionally. 
This means that there are many cases where the client's network configuration
is totally busted, but we'll attempt to drive-on since we think at least one of
the host that has been configured is reachable.  This leads to a seriously
degraded user experience.  In the case where no hosts are reachable, the client
should give up instead of continuing to try to make progress.
Comment 1 Shawn Walker 2009-06-18 08:47:46 UTC
It was always run only during catalog refresh, the catch is that we now only do
a catalog refresh if needed instead of always doing one.

But note that a test more strict than captive_portal_test,
valid_publisher_test, is now performed when updating or adding new publishers.

Bug 8487 was the last major change to this behaviour.
Comment 2 johansen 2009-06-18 11:28:40 UTC
(In reply to comment #1)
> It was always run only during catalog refresh, the catch is that we now only do
> a catalog refresh if needed instead of always doing one.

The test was put in the catalog refresh path, since we always ran a refresh
prior to an image-modifying operation.  The test's existence in that path was
always acknowledged as a hack, but at the time it was necessary.  We still need
to run the captive portal test prior to image-modifying operations that contact
the network.

This reason this is so is because we mark a number of potentially fatal errors
as retryable.  If the client tries to perform an image-modifying operation and
no hosts are reachable, it will do a lot of work trying to contact everyone on
the network, despite network connectivity being non-existent.  If we run the
test prior to contacting hosts, it's a quick check to make sure that
connectivity exists, and it prevents us from spending a bunch of time trying to
recover from fatal errors that we've assumed are transient.
Comment 3 johansen 2009-06-23 13:31:59 UTC
This bug is being fixed as part of the transport re-design.  A preliminary
webrev is available from:

http://cr.opensolaris.org/~johansen/webrev-xport-1/
Comment 4 johansen 2009-07-01 16:39:58 UTC
Integrated 1Jul2009 as change set a48bee2a4b2e9c8345c29acea63116acf77dddb3