On Mon, Aug 24, 2015 at 11:01:45AM -0700, Sage Weil wrote:
...
> I looked over the wip-addr branch a bit. I have two basic
> questions/concerns:
>
> 1) In commit
>
> https://github.com/ceph/ceph/commit/73b09090466a43d5aceb979a4802de3f3f5bf24a
> we switch the type field from sa_family to transport_type. This seems
> like the way to go, but we need to deal with the fact that lots of
> clusters out there are IPv6 and have AF_INET6 filled in here. Probably
> both types should be interpreted to mean "existing/legacy IP messenger" or
> whatever we want to call SimpleMessenger/AsyncMessenger's protocol.
>
> I think encode needs to make sure it fills in that value for the type when
> encoding the legacy entity_addr_t encoding, but could use a/the single
> valid value for the new encoding. And any get_transport_type() accessor
> should also return the single valid value.
With cohortfs I had the luxury of ignoring existing installations.
I think I concluded at the time that probably most real systems
had '0' stored here.
Yes, if there are things that have "junk" here, they'll need to be handled
appropriately - I think it can be mostly fixed by just having decode map
transport_type "AF_INET6" (which is probably 10) to 0. AF_INET is 2,
so maybe that too. People switching *back* to older code after using
new code won't be so happy - would probably need a patch for older code
if that's a concern.
If there's code out there that thinks get_transport_type() returns an address
family, that will need fixing. something like,
get_address_family(){ return in_addr->ss_family; }
but I didn't find any code that actually needed this.
>
> 2) In the later commit
>
> https://github.com/ceph/ceph/commit/9d203a2058f76414703b4fc212a1a0a960d0c672
> you introduce a grammar for printing/parsing the addrs. This also makes
> sense since e.g. xio uses an IP to identify an endpoint. I think we
> should identify these based on the *protocol* and not the implementation,
> though... whether we use SimpleMessenger or AsyncMessenger is not a
> property of the address. Maybe "tcp://" makes more sense here? Or
> perhaps no prefix at all (a bare IP address), so that this looks the same
> as it did before in the case where the default protocol(s) are in use.
>
> I assume the xio protocol (whether it is rdma or tcp) is closely tied to
> libxio itself.. is that right? If so, using xio in that prefix makes
> sense. I'd include xio somewhere in the rdma prefix though (xrdma:// and
> xtcp://)?
Code base I had didn't have async messenger as a transport type; if that
needs to be addressable independently of simple messenger it will need a
prefix. I had a prefixless address decode as simple messenger - I
can make this more obvious than "unsigned type = 0;".
I had these prefixes,
sm
rdma
xtcp
but I can easily pick others. I figured rdma didn't need an x since
there didn't seem to be much chance anybody would want a non-xio version
of rdma.
>
> What do you think?
>
> Logistically, I think the steps for getting this ready for merge are:
>
> 1) Separate out the preliminary patches that pass a feature to the addr
> encoding.. without any of the other cohortfs patches that are currently on
> this branch. Once this builds we can merge it separately from the rest...
>
> 2) The entity_addrvec_t type.
>
> 3) The type -> transport_type switch.
There's a lot of cohortfs glop that still needs to go away in this branch.
I hadn't thought of structuring the patch that way. I can do it,
but the encoding feature will be tedious to split out.
There's a bunch of cascading stuff after entity_addrvec_t which
could go into a series of further patches, monmap, and so forth,
not all of which exist yet.
>
> 4) We should make the new entity_addr_t encoding encode sockaddr piece
> more compactly instead of eating up a full 80-byte sockaddr_storage even
> for the ~8-byte IPv4 sockaddr_in. Maybe just need to encode an
> explicit length for the sockaddr_* piece?
Yes, this could be made more memory efficient. It's mainly a matter
of making sure the constructor gets the proper amount of memory.
There's a C hack to do this, but I remember seeing a c++ technique
that was a bit less hackish.
A sockaddr_in is actually 16 bytes of which the last 8 are dummy
padding (sin_zero). (And there there are some systems that
actually check to be sure they're 0.)
BSD4.4 derived systems have a length coded into sockaddr too - but on
linux it's usually possible to get by without storing the length, and
I don't think ceph will need to do so.
...
-Marcus Watts
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html