Hello Barbaros,
thank you for testing and excellent report.
</snip>
> ddb{1}> trace
> db_enter() at db_enter+0x10
> panic(ffffffff81f22e39) at panic+0xbf
> __assert(ffffffff81f96c9d,ffffffff81f85ebc,a3,ffffffff81fd252f) at
> __assert+0x25
> assertwaitok() at assertwaitok+0xcc
> mi_switch() at mi_switch+0x40
assert indicates we attempt to sleep inside SMR section,
which must be avoided.
> sleep_finish(ffff800025574da0,1) at sleep_finish+0x10b
> rw_enter(ffffffff822cfe50,1) at rw_enter+0x1cb
> pf_test(2,1,ffff80000520e000,ffff800025575058) at pf_test+0x1088
> ip_input_if(ffff800025575058,ffff800025575064,4,0,ffff80000520e000) at
> ip_input_if+0xcd
> ipv4_input(ffff80000520e000,fffffd8053616700) at ipv4_input+0x39
> ether_input(ffff80000520e000,fffffd8053616700) at ether_input+0x3ad
> vport_if_enqueue(ffff80000520e000,fffffd8053616700) at vport_if_enqueue+0x19
> veb_port_input(ffff8000051c3800,fffffd806064c200,ffffffffffff,ffff800002066600)
> at veb_port_input+0x4d2
> ether_input(ffff8000051c3800,fffffd806064c200) at ether_input+0x100
> vlan_input(ffff80000095a050,fffffd806064c200,ffff8000255752bc) at
> vlan_input+0x23d
> ether_input(ffff80000095a050,fffffd806064c200) at ether_input+0x85
> if_input_process(ffff80000095a050,ffff800025575358) at if_input_process+0x6f
> ifiq_process(ffff80000095a460) at ifiq_process+0x69
> taskq_thread(ffff800000035080) at taskq_thread+0x100
above is a call stack, which has done a bad thing (sleeping SMR section)
in my opinion the primary suspect is veb_port_input() which code reads as
follows:
966 static struct mbuf *
967 veb_port_input(struct ifnet *ifp0, struct mbuf *m, uint64_t dst, void
*brport)
968 {
969 struct veb_port *p = brport;
970 struct veb_softc *sc = p->p_veb;
971 struct ifnet *ifp = &sc->sc_if;
972 struct ether_header *eh;
...
1021 counters_pkt(ifp->if_counters, ifc_ipackets, ifc_ibytes,
1022 m->m_pkthdr.len);
1023
1024 /* force packets into the one routing domain for pf */
1025 m->m_pkthdr.ph_rtableid = ifp->if_rdomain;
1026
1027 #if NBPFILTER > 0
1028 if_bpf = READ_ONCE(ifp->if_bpf);
1029 if (if_bpf != NULL) {
1030 if (bpf_mtap_ether(if_bpf, m, 0) != 0)
1031 goto drop;
1032 }
1033 #endif
1034
1035 veb_span(sc, m);
1036
1037 if (ISSET(p->p_bif_flags, IFBIF_BLOCKNONIP) &&
1038 veb_ip_filter(m))
1039 goto drop;
1040
1041 if (!ISSET(ifp->if_flags, IFF_LINK0) &&
1042 veb_vlan_filter(m))
1043 goto drop;
1044
1045 if (veb_rule_filter(p, VEB_RULE_LIST_IN, m, src, dst))
1046 goto drop;
call to veb_span() at line 1035 seems to be our guy/culprit (in my opinion):
356 smr_read_enter();
357 SMR_TAILQ_FOREACH(p, &sc->sc_spans.l_list, p_entry) {
358 ifp0 = p->p_ifp0;
359 if (!ISSET(ifp0->if_flags, IFF_RUNNING))
360 continue;
361
362 m = m_dup_pkt(m0, max_linkhdr + ETHER_ALIGN, M_NOWAIT);
363 if (m == NULL) {
364 /* XXX count error */
365 continue;
366 }
367
368 if_enqueue(ifp0, m); /* XXX count error */
369 }
370 smr_read_leave();
loop above comes from veb_span(), which calls if_enqueue() from within
a smr section. The line 368 calls here:
2191 static int
2192 vport_if_enqueue(struct ifnet *ifp, struct mbuf *m)
2193 {
2194 /*
2195 * switching an l2 packet toward a vport means pushing it
2196 * into the network stack. this function exists to make
2197 * if_vinput compat with veb calling if_enqueue.
2198 */
2199
2200 if_vinput(ifp, m);
2201
2202 return (0);
2203 }
which in turn calls if_vinput() which calls further down to ipstack, and IP
stack my sleep. We must change veb_span() such calls to if_vinput() will happen
outside of SMR section.
I don't have such complex setup to use vlans and virtual ports. I'll try to
cook some diff and pass it to you for testing.
thanks again for coming back to us with report.
regards
sashan