summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-09-25inet: constify inet_rtx_syn_ack() sock argumentEric Dumazet
SYNACK packets are sent on behalf on unlocked listeners or fastopen sockets. Mark socket as const to catch future changes that might break the assumption. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25tcp/dccp: constify rtx_synack() and friendsEric Dumazet
This is done to make sure we do not change listener socket while sending SYNACK packets while socket lock is not held. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25dccp: constify dccp_make_response() socket argumentEric Dumazet
Like tcp_make_synack() the only time we might change the socket is when calling sock_wmalloc(), which is using atomic operation to update sk->sk_wmem_alloc Also use MAX_DCCP_HEADER as both IPv4/IPv6 use this value for max_header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25tcp: constify tcp_v{4|6}_send_synack() socket argumentEric Dumazet
This documents fact that listener lock might not be held at the time SYNACK are sent. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25ipv6: constify ip6_xmit() sock argumentEric Dumazet
This is to document that socket lock might not be held at this point. skb_set_owner_w() and ipv6_local_error() are using proper atomic ops or spinlocks, so we promote the socket to non const when calling them. netfilter hooks should never assume socket lock is held, we also promote the socket to non const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25tcp: constify tcp_make_synack() socket argumentEric Dumazet
listener socket is not locked when tcp_make_synack() is called. We better make sure no field is written. There is one exception : Since SYNACK packets are attached to the listener at this moment (or SYN_RECV child in case of Fast Open), sock_wmalloc() needs to update sk->sk_wmem_alloc, but this is done using atomic operations so this is safe. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25tcp: remove tcp_ecn_make_synack() socket argumentEric Dumazet
SYNACK packets might be sent without holding socket lock. For DCTCP/ECN sake, we should call INET_ECN_xmit() while socket lock is owned, and only when we init/change congestion control. This also fixies a bug if congestion module is changed from dctcp to another one on a listener : we now clear ECN bits properly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25tcp: remove tcp_synack_options() socket argumentEric Dumazet
We do not use the socket in this function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25ip: constify ip_build_and_send_pkt() socket argumentEric Dumazet
This function is used to build and send SYNACK packets, possibly on behalf of unlocked listener socket. Make sure we did not miss a write by making this socket const. We no longer can use ip_select_ident() and have to either set iph->id to 0 or directly call __ip_select_ident() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25tcp: md5: constify tcp_md5_do_lookup() socket argumentEric Dumazet
When TCP new listener is done, these functions will be called without socket lock being held. Make sure they don't change anything. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25ipv6: constify inet6_csk_route_req() socket argumentEric Dumazet
socket is not modified, make it const so that callers can do the same if they need. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25ipv6: constify ip6_dst_lookup_{flow|tail}() sock argumentsEric Dumazet
ip6_dst_lookup_flow() and ip6_dst_lookup_tail() do not touch socket, lets add a const qualifier. This will permit the same change in inet6_csk_route_req() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25inet: constify inet_csk_route_req() socket argumentEric Dumazet
This is used by TCP listener core, and listener socket shall not be modified by inet_csk_route_req(). Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25inet: constify ip_route_output_flow() socket argumentEric Dumazet
Very soon, TCP stack might call inet_csk_route_req(), which calls inet_csk_route_req() with an unlocked listener socket, so we need to make sure ip_route_output_flow() is not trying to change any field from its socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25tcp: constify tcp_openreq_init_rwin()Eric Dumazet
Soon, listener socket wont be locked when tcp_openreq_init_rwin() is called. We need to read socket fields once, as their value could change under us. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25tcp: constify listener socket in tcp_v[46]_init_req()Eric Dumazet
Soon, listener socket spinlock will no longer be held, add const arguments to tcp_v[46]_init_req() to make clear these functions can not mess socket fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25net: remove unused argument of __netdev_find_adj()Michal Kubeček
The __netdev_find_adj() helper does not use its first argument, only the device to find and list to walk through. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25l2tp: auto load IP modulesstephen hemminger
When creating a IP encapsulated tunnel the necessary l2tp module should be loaded. It already works for UDP encapsulation, it just doesn't work for direct IP encap. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25l2tp: auto load type modulesstephen hemminger
It should not be necessary to do explicit module loading when configuring L2TP. Modules should be loaded as needed instead (as is done already with netlink and other tunnel types). This patch adds a new module alias type and code to load the sub module on demand. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-25net: dsa: Set a "dsa" device_typeFlorian Fainelli
Provide a device_type information for slave network devices created by DSA, this is useful for user-space application to easily locate/search for devices of a specific kind. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24switchdev: reduce transaction phase enum down to a booleanJiri Pirko
Now, since we have only 2 values for transaction phase, just use bool. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24dsa: use prepare/commit switchdev transaction helpersJiri Pirko
The enum is going to disappear, use the helpers instead. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24switchdev: remove "ABORT" transaction phaseJiri Pirko
No longer used by drivers, as transaction queue with item destructors takes care of abort phase internally in switchdev code. So kill it. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24switchdev: move transaction phase enum under transaction structureJiri Pirko
Before it disappears completely, move transaction phase enum under transaction structure and make attr/obj structures a bit cleaner. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24switchdev: introduce transaction item queue for attr_set and obj_addJiri Pirko
Now, the memory allocation in prepare/commit state is done separatelly in each driver (rocker). Introduce the similar mechanism in generic switchdev code, in form of queue. That can be used not only for memory allocations, but also for different items. Abort item destruction is handled as well. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24switchdev: rename "trans" to "trans_ph".Jiri Pirko
This is temporary, name "trans" will be used for something else and "trans_ph" will eventually disappear. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24tcp: factorize sk_txhash initEric Dumazet
Neal suggested to move sk_txhash init into tcp_create_openreq_child(), called both from IPv4 and IPv6. This opportunity was missed in commit 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24ipv6: remove unused neigh parameter from ndisc functionsJiri Benc
Since commit 12fd84f4383b1 ("ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers."), the neigh parameter of ndisc_send_na and ndisc_send_ns is unused. CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-24genetlink: simplify genl_notifyJiri Benc
The genl_notify function has too many arguments for no real reason - all callers use genl_info to get them anyway. Just pass the genl_info down to genl_notify. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23bridge: don't age externally added FDB entriesSiva Mannem
Signed-off-by: Siva Mannem <siva.mannem.lnx@gmail.com> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Premkumar Jonnala <pjonnala@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23bridge: define some min/max/default ageing time constantsScott Feldman
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23cls_bpf: further limit exec opcodes subsetDaniel Borkmann
Jamal suggested to further limit the currently allowed subset of opcodes that may be used by a direct action return code as the intention is not to replace the full action engine, but rather to have a minimal set that can be used in the fast-path on things like ingress for some features that cls_bpf supports. Classifiers can, of course, still be chained together that have direct action mode with those that have a full exec pass. For more complex scenarios that go beyond this minimal set here, the full tcf_exts_exec() path must be used. Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23cls_bpf: make binding to classid optionalDaniel Borkmann
The binding to a particular classid was so far always mandatory for cls_bpf, but it doesn't need to be. Therefore, lift this restriction as similarly done in other classifiers. Only a couple of qdiscs make use of class from the tcf_result, others don't strictly care, so let the user choose his needs (those that read out class can handle situations where it could be NULL). An explicit check for tcf_unbind_filter() is also not needed here, as the previous r->class was 0, so the xchg() will return that and therefore a callback to the qdisc's unbind_tcf() is skipped. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23cls_bpf: also dump TCA_BPF_FLAGSDaniel Borkmann
In commit 43388da42a49 ("cls_bpf: introduce integrated actions") we have added TCA_BPF_FLAGS. We can also retrieve this information from the prog, dump it back to user space as well. It's useful in tc when displaying/dumping filter info. Also, remove tp from cls_bpf_prog_from_efd(), came in as a conflict from a rebase and it's unused here (later work may add it along with a real user). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23sched, bpf: let stack handle !IFF_UP devs on bpf_clone_redirectDaniel Borkmann
Similarly as already the case in bpf_redirect()/skb_do_redirect() pair, let the stack deal with devs that are !IFF_UP. dev_forward_skb() as well as dev_queue_xmit() will free the skb and increment drop counter internally in such cases, so we can spare the condition in bpf_clone_redirect(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-23ipv6 Use get_hash_from_flowi6 for rt6 hashTom Herbert
In rt6_info_hash_nhsfn replace the custom hashing over flowi6 that is using xor with a call to common function get_hash_from_flowi6. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree in this 4.4 development cycle, they are: 1) Schedule ICMP traffic to IPVS instances, this introduces a new schedule_icmp proc knob to enable/disable it. By default is off to retain the old behaviour. Patchset from Alex Gartrell. I'm also including what Alex originally said for the record: "The configuration of ipvs at Facebook is relatively straightforward. All ipvs instances bgp advertise a set of VIPs and the network prefers the nearest one or uses ECMP in the event of a tie. For the uninitiated, ECMP deterministically and statelessly load balances by hashing the packet (usually a 5-tuple of protocol, saddr, daddr, sport, and dport) and using that number as an index (basic hash table type logic). The problem is that ICMP packets (which contain really important information like whether or not an MTU has been exceeded) will get a different hash value and may end up at a different ipvs instance. With no information about where to route these packets, they are dropped, creating ICMP black holes and breaking Path MTU discovery. Suddenly, my mom's pictures can't load and I'm fielding midday calls that I want nothing to do with. To address this, this patch set introduces the ability to schedule icmp packets which is gated by a sysctl net.ipv4.vs.schedule_icmp. If set to 0, the old behavior is maintained -- otherwise ICMP packets are scheduled." 2) Add another proc entry to ignore tunneled packets to avoid routing loops from IPVS, also from Alex. 3) Fifteen patches from Eric Biederman to: * Stop passing nf_hook_ops as parameter to the hook and use the state hook object instead all around the netfilter code, so only the private data pointer is passed to the registered hook function. * Now that we've got state->net, propagate the netns pointer to netfilter hook clients to avoid its computation over and over again. A good example of how this has been simplified is the former TEE target (now nf_dup infrastructure) since it has killed the ugly pick_net() function. There's another round of netns updates from Eric Biederman making the line. To avoid the patchbomb again to almost all the networking mailing list (that is 84 patches) I'd suggest we send you a pull request with no patches or let me know if you prefer a better way. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-21tcp: send loss probe after 1s if no RTT availableYuchung Cheng
This patch makes TLP to use 1 sec timer by default when RTT is not available due to SYN/ACK retransmission or SYN cookies. Prior to this change, the lack of RTT prevents TLP so the first data packets sent can only be recovered by fast recovery or RTO. If the fast recovery fails to trigger the RTO is 3 second when SYN/ACK is retransmitted. With this patch we can trigger fast recovery in 1sec instead. Note that we need to check Fast Open more properly. A Fast Open connection could be (accepted then) closed before it receives the final ACK of 3WHS so the state is FIN_WAIT_1. Without the new check, TLP will retransmit FIN instead of SYN/ACK. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-21tcp: usec resolution SYN/ACK RTTYuchung Cheng
Currently SYN/ACK RTT is measured in jiffies. For LAN the SYN/ACK RTT is often measured as 0ms or sometimes 1ms, which would affect RTT estimation and min RTT samping used by some congestion control. This patch improves SYN/ACK RTT to be usec resolution if platform supports it. While the timestamping of SYN/ACK is done in request sock, the RTT measurement is carefully arranged to avoid storing another u64 timestamp in tcp_sock. For regular handshake w/o SYNACK retransmission, the RTT is sampled right after the child socket is created and right before the request sock is released (tcp_check_req() in tcp_minisocks.c) For Fast Open the child socket is already created when SYN/ACK was sent, the RTT is sampled in tcp_rcv_state_process() after processing the final ACK an right before the request socket is released. If the SYN/ACK was retransmistted or SYN-cookie was used, we rely on TCP timestamps to measure the RTT. The sample is taken at the same place in tcp_rcv_state_process() after the timestamp values are validated in tcp_validate_incoming(). Note that we do not store TS echo value in request_sock for SYN-cookies, because the value is already stored in tp->rx_opt used by tcp_ack_update_rtt(). One side benefit is that the RTT measurement now happens before initializing congestion control (of the passive side). Therefore the congestion control can use the SYN/ACK RTT. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-21s390/iucv: do not use arrays as argumentUrsula Braun
The iucv code uses arrays as arguments. Even though this does not really cause a problem, it could be misleading, since the compiler turns array arguments into just a pointer argument. To be more precise this patch changes the array arguments into pointers. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-21Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-09-18 Here's the first bluetooth-next pull request for the 4.4 kernel: - ieee802154 cleanups & fixes - debugfs support for the at86rf230 driver - Support for quirky (seemingly counterfeit) CSR Bluetooth controllers - Power management and device config improvements for Intel controllers - Fix for devices with incorrect advertising data length - Fix for closing HCI user channel socket Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-20rxrpc: Replace get_seconds with ktime_get_secondsKsenija Stanojevic
Replace time_t type and get_seconds function which are not y2038 safe on 32-bit systems. Function ktime_get_seconds use monotonic instead of real time and therefore will not cause overflow. Signed-off-by: Ksenija Stanojevic <ksenija.stanojevic@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-18netfilter: Use nf_ct_net instead of dev_net(out) in nf_nat_masquerade_ipv6Eric W. Biederman
Use nf_ct_net(ct) instead of guessing that the netdevice out can reliably report the network namespace the conntrack operation is happening in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18netfilter: Pass net into nf_xfrm_me_harderEric W. Biederman
Instead of calling dev_net on a likley looking network device pass state->net into nf_xfrm_me_harder. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18netfilter: Pass priv instead of nf_hook_ops to netfilter hooksEric W. Biederman
Only pass the void *priv parameter out of the nf_hook_ops. That is all any of the functions are interested now, and by limiting what is passed it becomes simpler to change implementation details. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18ipvs: Read hooknum from state rather than ops->hooknumEric W. Biederman
This should be more cache efficient as state is more likely to be in core, and the netfilter core will stop passing in ops soon. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18netfilter: nf_conntrack: Add a struct net parameter to l4_pkt_to_tupleEric W. Biederman
As gre does not have the srckey in the packet gre_pkt_to_tuple needs to perform a lookup in it's per network namespace tables. Pass in the proper network namespace to all pkt_to_tuple implementations to ensure gre (and any similar protocols) can get this right. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18act_connmark: Remember the struct net instead of guessing it.Eric W. Biederman
Stop guessing the struct net instead of remember it. Guessing is just silly and will be problematic in the future when I implement routes between network namespaces. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18netfilter: Pass net to nf_dup_ipv4 and nf_dup_ipv6Eric W. Biederman
This allows them to stop guessing the network namespace with pick_net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-09-18netfilter: nf_tables: Use pkt->net instead of computing net from the passed ↵Eric W. Biederman
net_devices Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>