summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-05-09Linux 3.2.69v3.2.69Ben Hutchings
2015-05-09Revert "KVM: s390: flush CPU on load control"Ben Hutchings
This reverts commit 823f14022fd2335affc8889a9c7e1b60258883a3, which was commit 2dca485f8740208604543c3960be31a5dd3ea603 upstream. It depends on functionality that is not present in 3.2.y. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
2015-05-09ipvs: uninitialized data with IP_VS_IPV6Dan Carpenter
commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream. The app_tcp_pkt_out() function expects "*diff" to be set and ends up using uninitialized data if CONFIG_IP_VS_IPV6 is turned on. The same issue is there in app_tcp_pkt_in(). Thanks to Julian Anastasov for noticing that. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-09ipvs: rerouting to local clients is not needed anymoreJulian Anastasov
commit 579eb62ac35845686a7c4286c0a820b4eb1f96aa upstream. commit f5a41847acc5 ("ipvs: move ip_route_me_harder for ICMP") from 2.6.37 introduced ip_route_me_harder() call for responses to local clients, so that we can provide valid rt_src after SNAT. It was used by TCP to provide valid daddr for ip_send_reply(). After commit 0a5ebb8000c5 ("ipv4: Pass explicit daddr arg to ip_send_reply()." from 3.0 this rerouting is not needed anymore and should be avoided, especially in LOCAL_IN. Fixes 3.12.33 crash in xfrm reported by Florian Wiessner: "3.12.33 - BUG xfrm_selector_match+0x25/0x2f6" Reported-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de> Tested-by: Smart Weblications GmbH - Florian Wiessner <f.wiessner@smart-weblications.de> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-09IB/core: Avoid leakage from kernel to user spaceEli Cohen
commit 377b513485fd885dea1083a9a5430df65b35e048 upstream. Clear the reserved field of struct ib_uverbs_async_event_desc which is copied to user space. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Yann Droneaud <ydroneaud@opteya.com>
2015-05-09spi: spidev: fix possible arithmetic overflow for multi-transfer messageIan Abbott
commit f20fbaad7620af2df36a1f9d1c9ecf48ead5b747 upstream. `spidev_message()` sums the lengths of the individual SPI transfers to determine the overall SPI message length. It restricts the total length, returning an error if too long, but it does not check for arithmetic overflow. For example, if the SPI message consisted of two transfers and the first has a length of 10 and the second has a length of (__u32)(-1), the total length would be seen as 9, even though the second transfer is actually very long. If the second transfer specifies a null `rx_buf` and a non-null `tx_buf`, the `copy_from_user()` could overrun the spidev's pre-allocated tx buffer before it reaches an invalid user memory address. Fix it by checking that neither the total nor the individual transfer lengths exceed the maximum allowed value. Thanks to Dan Carpenter for reporting the potential integer overflow. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Mark Brown <broonie@kernel.org> [Ian Abbott: Note: original commit compares the lengths to INT_MAX instead of bufsiz due to changes in earlier commits.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09net: make skb_gso_segment error handling more robustFlorian Westphal
commit 330966e501ffe282d7184fde4518d5e0c24bc7f8 upstream. skb_gso_segment has three possible return values: 1. a pointer to the first segmented skb 2. an errno value (IS_ERR()) 3. NULL. This can happen when GSO is used for header verification. However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL and would oops when NULL is returned. Note that these call sites should never actually see such a NULL return value; all callers mask out the GSO bits in the feature argument. However, there have been issues with some protocol handlers erronously not respecting the specified feature mask in some cases. It is preferable to get 'have to turn off hw offloading, else slow' reports rather than 'kernel crashes'. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> [Brad Spengler: backported to 3.2] Signed-off-by: Brad Spengler <spender@grsecurity.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09tcp: avoid looping in tcp_send_fin()Eric Dumazet
[ Upstream commit 845704a535e9b3c76448f52af1b70e4422ea03fd ] Presence of an unbound loop in tcp_send_fin() had always been hard to explain when analyzing crash dumps involving gigantic dying processes with millions of sockets. Lets try a different strategy : In case of memory pressure, try to add the FIN flag to last packet in write queue, even if packet was already sent. TCP stack will be able to deliver this FIN after a timeout event. Note that this FIN being delivered by a retransmit, it also carries a Push flag given our current implementation. By checking sk_under_memory_pressure(), we anticipate that cooking many FIN packets might deplete tcp memory. In the case we could not allocate a packet, even with __GFP_WAIT allocation, then not sending a FIN seems quite reasonable if it allows to get rid of this socket, free memory, and not block the process from eventually doing other useful work. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: - Drop inapplicable change to sk_forced_wmem_schedule() - s/sk_under_memory_pressure(sk)/tcp_memory_pressure/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ip_forward: Drop frames with attached skb->skSebastian Pöhn
[ Upstream commit 2ab957492d13bb819400ac29ae55911d50a82a13 ] Initial discussion was: [FYI] xfrm: Don't lookup sk_policy for timewait sockets Forwarded frames should not have a socket attached. Especially tw sockets will lead to panics later-on in the stack. This was observed with TPROXY assigning a tw socket and broken policy routing (misconfigured). As a result frame enters forwarding path instead of input. We cannot solve this in TPROXY as it cannot know that policy routing is broken. v2: Remove useless comment Signed-off-by: Sebastian Poehn <sebastian.poehn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09gianfar: Carefully free skbs in functions called by netpoll.Eric W. Biederman
commit c9974ad4aeb36003860100221a594f3c0ccc3f78 upstream. netpoll can call functions in hard irq context that are ordinarily called in lesser contexts. For those functions use dev_kfree_skb_any and dev_consume_skb_any so skbs are freed safely from hard irq context. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: use only dev_kfree_skb() and not dev_consume_skb_any()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09benet: Call dev_kfree_skby_any instead of kfree_skb.Eric W. Biederman
commit d8ec2c02caa3515f35d6c33eedf529394c419298 upstream. Replace free_skb with dev_kfree_skb_any in be_tx_compl_process as which can be called in hard irq by netpoll, softirq context by normal napi polling, and in normal sleepable context by the network device close method. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ixgb: Call dev_kfree_skby_any instead of dev_kfree_skb.Eric W. Biederman
commit f7e79913a1d6a6139211ead3b03579b317d25a1f upstream. Replace dev_kfree_skb with dev_kfree_skb_any in functions that can be called in hard irq and other contexts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09tg3: Call dev_kfree_skby_any instead of dev_kfree_skb.Eric W. Biederman
commit 497a27b9e1bcf6dbaea7a466cfcd866927e1b431 upstream. Replace dev_kfree_skb with dev_kfree_skb_any in functions that can be called in hard irq and other contexts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09r8169: Call dev_kfree_skby_any instead of dev_kfree_skb.Eric W. Biederman
commit 989c9ba104d9ce53c1ca918262f3fdfb33aca12a upstream. Replace dev_kfree_skb with dev_kfree_skb_any in functions that can be called in hard irq and other contexts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-098139too: Call dev_kfree_skby_any instead of dev_kfree_skb.Eric W. Biederman
commit a2ccd2e4bd70122523a7bf21cec4dd6e34427089 upstream. Replace dev_kfree_skb with dev_kfree_skb_any in functions that can be called in hard irq and other contexts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-098139cp: Call dev_kfree_skby_any instead of kfree_skb.Eric W. Biederman
commit 508f81d517ed1f3f0197df63ea7ab5cd91b6f3b3 upstream. Replace kfree_skb with dev_kfree_skb_any in cp_start_xmit as it can be called in both hard irq and other contexts. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09tcp: make connect() mem charging friendlyEric Dumazet
[ Upstream commit 355a901e6cf1b2b763ec85caa2a9f04fbcc4ab4a ] While working on sk_forward_alloc problems reported by Denys Fedoryshchenko, we found that tcp connect() (and fastopen) do not call sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so sk_forward_alloc is negative while connect is in progress. We can fix this by calling regular sk_stream_alloc_skb() both for the SYN packet (in tcp_connect()) and the syn_data packet in tcp_send_syn_data() Then, tcp_send_syn_data() can avoid copying syn_data as we simply can manipulate syn_data->cb[] to remove SYN flag (and increment seq) Instead of open coding memcpy_fromiovecend(), simply use this helper. This leaves in socket write queue clean fast clone skbs. This was tested against our fastopen packetdrill tests. Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: - Drop the Fast Open changes - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()Al Viro
[ Upstream commit 7d985ed1dca5c90535d67ce92ef6ca520302340a ] [I would really like an ACK on that one from dhowells; it appears to be quite straightforward, but...] MSG_PEEK isn't passed to ->recvmsg() via msg->msg_flags; as the matter of fact, neither the kernel users of rxrpc, nor the syscalls ever set that bit in there. It gets passed via flags; in fact, another such check in the same function is done correctly - as flags & MSG_PEEK. It had been that way (effectively disabled) for 8 years, though, so the patch needs beating up - that case had never been tested. If it is correct, it's -stable fodder. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09caif: fix MSG_OOB test in caif_seqpkt_recvmsg()Al Viro
[ Upstream commit 3eeff778e00c956875c70b145c52638c313dfb23 ] It should be checking flags, not msg->msg_flags. It's ->sendmsg() instances that need to look for that in ->msg_flags, ->recvmsg() ones (including the other ->recvmsg() instance in that file, as well as unix_dgram_recvmsg() this one claims to be imitating) check in flags. Braino had been introduced in commit dcda13 ("caif: Bugfix - use MSG_TRUNC in receive") back in 2010, so it goes quite a while back. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09rds: avoid potential stack overflowArnd Bergmann
[ Upstream commit f862e07cf95d5b62a5fc5e981dd7d0dbaf33a501 ] The rds_iw_update_cm_id function stores a large 'struct rds_sock' object on the stack in order to pass a pair of addresses. This happens to just fit withint the 1024 byte stack size warning limit on x86, but just exceed that limit on ARM, which gives us this warning: net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] As the use of this large variable is basically bogus, we can rearrange the code to not do that. Instead of passing an rds socket into rds_iw_get_device, we now just pass the two addresses that we have available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly, to create two address structures on the stack there. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09net: sysctl_net_core: check SNDBUF and RCVBUF for min lengthAlexey Kodanev
[ Upstream commit b1cb59cf2efe7971d3d72a7b963d09a512d994c9 ] sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be set to incorrect values. Given that 'struct sk_buff' allocates from rcvbuf, incorrectly set buffer length could result to memory allocation failures. For example, set them as follows: # sysctl net.core.rmem_default=64 net.core.wmem_default = 64 # sysctl net.core.wmem_default=64 net.core.wmem_default = 64 # ping localhost -s 1024 -i 0 > /dev/null This could result to the following failure: skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32 head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL> kernel BUG at net/core/skbuff.c:102! invalid opcode: 0000 [#1] SMP ... task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000 RIP: 0010:[<ffffffff8155fbd1>] [<ffffffff8155fbd1>] skb_put+0xa1/0xb0 RSP: 0018:ffff88003ae8bc68 EFLAGS: 00010296 RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000 RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8 RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000 R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900 FS: 00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0 Stack: ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550 ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00 Call Trace: [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470 [<ffffffff81556f56>] sock_write_iter+0x146/0x160 [<ffffffff811d9612>] new_sync_write+0x92/0xd0 [<ffffffff811d9cd6>] vfs_write+0xd6/0x180 [<ffffffff811da499>] SyS_write+0x59/0xd0 [<ffffffff81651532>] system_call_fastpath+0x12/0x17 Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 RIP [<ffffffff8155fbd1>] skb_put+0xa1/0xb0 RSP <ffff88003ae8bc68> Kernel panic - not syncing: Fatal exception Moreover, the possible minimum is 1, so we can get another kernel panic: ... BUG: unable to handle kernel paging request at ffff88013caee5c0 IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0 ... Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: delete now-unused 'one' variable] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09net: avoid to hang up on sending due to sysctl configuration overflow.bingtian.ly@taobao.com
commit cdda88912d62f9603d27433338a18be83ef23ac1 upstream. I found if we write a larger than 4GB value to some sysctl variables, the sending syscall will hang up forever, because these variables are 32 bits, such large values make them overflow to 0 or negative. This patch try to fix overflow or prevent from zero value setup of below sysctl variables: net.core.wmem_default net.core.rmem_default net.core.rmem_max net.core.wmem_max net.ipv4.udp_rmem_min net.ipv4.udp_wmem_min net.ipv4.tcp_wmem net.ipv4.tcp_rmem Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Li Yu <raise.sail@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: - Adjust context - Delete now-unused 'zero' variable] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09net: ping: Return EAFNOSUPPORT when appropriate.Lorenzo Colitti
[ Upstream commit 9145736d4862145684009d6a72a6e61324a9439e ] 1. For an IPv4 ping socket, ping_check_bind_addr does not check the family of the socket address that's passed in. Instead, make it behave like inet_bind, which enforces either that the address family is AF_INET, or that the family is AF_UNSPEC and the address is 0.0.0.0. 2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL if the socket family is not AF_INET6. Return EAFNOSUPPORT instead, for consistency with inet6_bind. 3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT instead of EINVAL if an incorrect socket address structure is passed in. 4. Make IPv6 ping sockets be IPv6-only. The code does not support IPv4, and it cannot easily be made to support IPv4 because the protocol numbers for ICMP and ICMPv6 are different. This makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead of making the socket unusable. Among other things, this fixes an oops that can be triggered by: int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP); struct sockaddr_in6 sin6 = { .sin6_family = AF_INET6, .sin6_addr = in6addr_any, }; bind(s, (struct sockaddr *) &sin6, sizeof(sin6)); Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: - Drop the IPv6 part - Adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09udp: only allow UFO for packets from SOCK_DGRAM socketsMichal Kubeček
[ Upstream commit acf8dd0a9d0b9e4cdb597c2f74802f79c699e802 ] If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer checksum is to be computed on segmentation. However, in this case, skb->csum_start and skb->csum_offset are never set as raw socket transmit path bypasses udp_send_skb() where they are usually set. As a result, driver may access invalid memory when trying to calculate the checksum and store the result (as observed in virtio_net driver). Moreover, the very idea of modifying the userspace provided UDP header is IMHO against raw socket semantics (I wasn't able to find a document clearly stating this or the opposite, though). And while allowing CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit too intrusive change just to handle a corner case like this. Therefore disallowing UFO for packets from SOCK_DGRAM seems to be the best option. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09usb: plusb: Add support for National Instruments host-to-host cableBen Shelton
[ Upstream commit 42c972a1f390e3bc51ca1e434b7e28764992067f ] The National Instruments USB Host-to-Host Cable is based on the Prolific PL-25A1 chipset. Add its VID/PID so the plusb driver will recognize it. Signed-off-by: Ben Shelton <ben.shelton@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09macvtap: make sure neighbour code can push ethernet headerEric Dumazet
[ Upstream commit 2f1d8b9e8afa5a833d96afcd23abcb8cdf8d83ab ] Brian reported crashes using IPv6 traffic with macvtap/veth combo. I tracked the crashes in neigh_hh_output() -> memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD); Neighbour code assumes headroom to push Ethernet header is at least 16 bytes. It appears macvtap has only 14 bytes available on arches where NET_IP_ALIGN is 0 (like x86) Effect is a corruption of 2 bytes right before skb->head, and possible crashes if accessing non existing memory. This fix should also increase IPv4 performance, as paranoid code in ip_finish_output2() wont have to call skb_realloc_headroom() Reported-by: Brian Rak <brak@vultr.com> Tested-by: Brian Rak <brak@vultr.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09macvtap: limit head length of skb allocatedJason Wang
commit 16a3fa28630331e28208872fa5341ce210b901c7 upstream. We currently use hdr_len as a hint of head length which is advertised by guest. But when guest advertise a very big value, it can lead to an 64K+ allocating of kmalloc() which has a very high possibility of failure when host memory is fragmented or under heavy stress. The huge hdr_len also reduce the effect of zerocopy or even disable if a gso skb is linearized in guest. To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the head, which guarantees an order 0 allocation each time. Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09net: reject creation of netdev names with colonsMatthew Thode
[ Upstream commit a4176a9391868bfa87705bcd2e3b49e9b9dd2996 ] colons are used as a separator in netdev device lookup in dev_ioctl.c Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME Signed-off-by: Matthew Thode <mthode@mthode.org> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ematch: Fix auto-loading of ematch modules.Ignacy Gawędzki
[ Upstream commit 34eea79e2664b314cab6a30fc582fdfa7a1bb1df ] In tcf_em_validate(), after calling request_module() to load the kind-specific module, set em->ops to NULL before returning -EAGAIN, so that module_put() is not called again by tcf_em_tree_destroy(). Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ipv4: ip_check_defrag should not assume that skb_network_offset is zeroAlexander Drozdov
[ Upstream commit 3e32e733d1bbb3f227259dc782ef01d5706bdae0 ] ip_check_defrag() may be used by af_packet to defragment outgoing packets. skb_network_offset() of af_packet's outgoing packets is not zero. Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09gen_stats.c: Duplicate xstats buffer for later useIgnacy Gawędzki
[ Upstream commit 1c4cff0cf55011792125b6041bc4e9713e46240f ] The gnet_stats_copy_app() function gets called, more often than not, with its second argument a pointer to an automatic variable in the caller's stack. Therefore, to avoid copying garbage afterwards when calling gnet_stats_finish_copy(), this data is better copied to a dynamically allocated memory that gets freed after use. [xiyou.wangcong@gmail.com: remove a useless kfree()] Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09rtnetlink: call ->dellink on failure when ->newlink existsWANG Cong
[ Upstream commit 7afb8886a05be68e376655539a064ec672de8a8e ] Ignacy reported that when eth0 is down and add a vlan device on top of it like: ip link add link eth0 name eth0.1 up type vlan id 1 We will get a refcount leak: unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2 The problem is when rtnl_configure_link() fails in rtnl_newlink(), we simply call unregister_device(), but for stacked device like vlan, we almost do nothing when we unregister the upper device, more work is done when we unregister the lower device, so call its ->dellink(). Reported-by: Ignacy Gawedzki <ignacy.gawedzki@green-communications.fr> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ppp: deflate: never return len larger than output bufferFlorian Westphal
[ Upstream commit e2a4800e75780ccf4e6c2487f82b688ba736eb18 ] When we've run out of space in the output buffer to store more data, we will call zlib_deflate with a NULL output buffer until we've consumed remaining input. When this happens, olen contains the size the output buffer would have consumed iff we'd have had enough room. This can later cause skb_over_panic when ppp_generic skb_put()s the returned length. Reported-by: Iain Douglas <centos@1n6.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ping: Fix race in free in receive pathsubashab@codeaurora.org
[ Upstream commit fc752f1f43c1c038a2c6ae58cc739ebb5953ccb0 ] An exception is seen in ICMP ping receive path where the skb destructor sock_rfree() tries to access a freed socket. This happens because ping_rcv() releases socket reference with sock_put() and this internally frees up the socket. Later icmp_rcv() will try to free the skb and as part of this, skb destructor is called and which leads to a kernel panic as the socket is freed already in ping_rcv(). -->|exception -007|sk_mem_uncharge -007|sock_rfree -008|skb_release_head_state -009|skb_release_all -009|__kfree_skb -010|kfree_skb -011|icmp_rcv -012|ip_local_deliver_finish Fix this incorrect free by cloning this skb and processing this cloned skb instead. This patch was suggested by Eric Dumazet Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09netxen: fix netxen_nic_poll() logicEric Dumazet
[ Upstream commit 6088beef3f7517717bd21d90b379714dd0837079 ] NAPI poll logic now enforces that a poller returns exactly the budget when it wants to be called again. If a driver limits TX completion, it has to return budget as well when the limit is hit, not the number of received packets. Reported-and-tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: d75b1ade567f ("net: less interrupt masking in NAPI") Cc: Manish Chopra <manish.chopra@qlogic.com> Acked-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ipv6: stop sending PTB packets for MTU < 1280Hagen Paul Pfeifer
[ Upstream commit 9d289715eb5c252ae15bd547cb252ca547a3c4f2 ] Reduce the attack vector and stop generating IPv6 Fragment Header for paths with an MTU smaller than the minimum required IPv6 MTU size (1280 byte) - called atomic fragments. See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1] for more information and how this "feature" can be misused. [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00 Signed-off-by: Fernando Gont <fgont@si6networks.com> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09net: rps: fix cpu unplugEric Dumazet
[ Upstream commit ac64da0b83d82abe62f78b3d0e21cca31aea24fa ] softnet_data.input_pkt_queue is protected by a spinlock that we must hold when transferring packets from victim queue to an active one. This is because other cpus could still be trying to enqueue packets into victim queue. A second problem is that when we transfert the NAPI poll_list from victim to current cpu, we absolutely need to special case the percpu backlog, because we do not want to add complex locking to protect process_queue : Only owner cpu is allowed to manipulate it, unless cpu is offline. Based on initial patch from Prasad Sodagudi & Subash Abhinov Kasiviswanathan. This version is better because we do not slow down packet processing, only make migration safer. Reported-by: Prasad Sodagudi <psodagud@codeaurora.org> Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ip: zero sockaddr returned on error queueWillem de Bruijn
[ Upstream commit f812116b174e59a350acc8e4856213a166a91222 ] The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That structure is defined and allocated on the stack as struct { struct sock_extended_err ee; struct sockaddr_in(6) offender; } errhdr; The second part is only initialized for certain SO_EE_ORIGIN values. Always initialize it completely. An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that would return uninitialized bytes. Signed-off-by: Willem de Bruijn <willemb@google.com> ---- Also verified that there is no padding between errhdr.ee and errhdr.offender that could leak additional kernel data. Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09jfs: fix readdir regressionDave Kleikamp
Upstream commit 44512449, "jfs: fix readdir cookie incompatibility with NFSv4", was backported incorrectly into the stable trees which used the filldir callback (rather than dir_emit). The position is being incorrectly passed to filldir for the . and .. entries. The still-maintained stable trees that need to be fixed are 3.2.y, 3.4.y and 3.10.y. https://bugzilla.kernel.org/show_bug.cgi?id=94741 Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: jfs-discussion@lists.sourceforge.net Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09NFSv4: Minor cleanups for nfs4_handle_exception and nfs4_async_handle_errorTrond Myklebust
commit 14977489ffdb80d4caf5a184ba41b23b02fbacd9 upstream. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: This is not merely a cleanup but also fixes a regression introduced by commit 3114ea7a24d3 ("NFSv4: Return the delegation if the server returns NFS4ERR_OPENMODE"), backported in 3.2.14] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr ↵Ani Sinha
struct from userland. commit 6a2a2b3ae0759843b22c929881cc184b00cc63ff upstream. Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will break old binaries and any code for which there is no access to source code. To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland. Signed-off-by: Ani Sinha <ani@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ipv4: Missing sk_nulls_node_init() in ping_unhash().David S. Miller
commit a134f083e79fb4c3d0a925691e732c56911b4326 upstream. If we don't do that, then the poison value is left in the ->pprev backlink. This can cause crashes if we do a disconnect, followed by a connect(). Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Wen Xu <hotdog3645@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09fs: take i_mutex during prepare_binprm for set[ug]id executablesJann Horn
commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream. This prevents a race between chown() and execve(), where chowning a setuid-user binary to root would momentarily make the binary setuid root. This patch was mostly written by Linus Torvalds. Signed-off-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Drop the task_no_new_privs() and user namespace checks - Open-code file_inode() - s/READ_ONCE/ACCESS_ONCE/ - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ipv6: Don't reduce hop limit for an interfaceD.S. Ljungmark
commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a upstream. A local route may have a lower hop_limit set than global routes do. RFC 3756, Section 4.2.7, "Parameter Spoofing" > 1. The attacker includes a Current Hop Limit of one or another small > number which the attacker knows will cause legitimate packets to > be dropped before they reach their destination. > As an example, one possible approach to mitigate this threat is to > ignore very small hop limits. The nodes could implement a > configurable minimum hop limit, and ignore attempts to set it below > said limit. Signed-off-by: D.S. Ljungmark <ljungmark@modio.se> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.2: adjust ND_PRINTK() usage] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09net: rds: use correct size for max unacked packets and bytesSasha Levin
commit db27ebb111e9f69efece08e4cb6a34ff980f8896 upstream. Max unacked packets/bytes is an int while sizeof(long) was used in the sysctl table. This means that when they were getting read we'd also leak kernel memory to userspace along with the timeout values. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09net: llc: use correct size for sysctl timeout entriesSasha Levin
commit 6b8d9117ccb4f81b1244aafa7bc70ef8fa45fc49 upstream. The timeout entries are sizeof(int) rather than sizeof(long), which means that when they were getting read we'd also leak kernel memory to userspace along with the timeout values. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->lenAndrey Vagin
commit 223b02d923ecd7c84cf9780bb3686f455d279279 upstream. "len" contains sizeof(nf_ct_ext) and size of extensions. In a worst case it can contain all extensions. Bellow you can find sizes for all types of extensions. Their sum is definitely bigger than 256. nf_ct_ext_types[0]->len = 24 nf_ct_ext_types[1]->len = 32 nf_ct_ext_types[2]->len = 24 nf_ct_ext_types[3]->len = 32 nf_ct_ext_types[4]->len = 152 nf_ct_ext_types[5]->len = 2 nf_ct_ext_types[6]->len = 16 nf_ct_ext_types[7]->len = 8 I have seen "len" up to 280 and my host has crashes w/o this patch. The right way to fix this problem is reducing the size of the ecache extension (4) and Florian is going to do this, but these changes will be quite large to be appropriate for a stable tree. Fixes: 5b423f6a40a0 (netfilter: nf_conntrack: fix racy timer handling with reliable) Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ALSA: usb - Creative USB X-Fi Pro SB1095 volume knob supportDmitry M. Fedin
commit 3dc8523fa7412e731441c01fb33f003eb3cfece1 upstream. Adds an entry for Creative USB X-Fi to the rc_config array in mixer_quirks.c to allow use of volume knob on the device. Adds support for newer X-Fi Pro card, known as "Model No. SB1095" with USB ID "041e:3237" Signed-off-by: Dmitry M. Fedin <dmitry.fedin@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09ocfs2: _really_ sync the right rangeAl Viro
commit 64b4e2526d1cf6e6a4db6213d6e2b6e6ab59479a upstream. "ocfs2 syncs the wrong range" had been broken; prior to it the code was doing the wrong thing in case of O_APPEND, all right, but _after_ it we were syncing the wrong range in 100% cases. *ppos, aka iocb->ki_pos is incremented prior to that point, so we are always doing sync on the area _after_ the one we'd written to. Spotted by Joseph Qi <joseph.qi@huawei.com> back in January; unfortunately, I'd missed his mail back then ;-/ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2015-05-09Defer processing of REQ_PREEMPT requests for blocked devicesBart Van Assche
commit bba0bdd7ad4713d82338bcd9b72d57e9335a664b upstream. SCSI transport drivers and SCSI LLDs block a SCSI device if the transport layer is not operational. This means that in this state no requests should be processed, even if the REQ_PREEMPT flag has been set. This patch avoids that a rescan shortly after a cable pull sporadically triggers the following kernel oops: BUG: unable to handle kernel paging request at ffffc9001a6bc084 IP: [<ffffffffa04e08f2>] mlx4_ib_post_send+0xd2/0xb30 [mlx4_ib] Process rescan-scsi-bus (pid: 9241, threadinfo ffff88053484a000, task ffff880534aae100) Call Trace: [<ffffffffa0718135>] srp_post_send+0x65/0x70 [ib_srp] [<ffffffffa071b9df>] srp_queuecommand+0x1cf/0x3e0 [ib_srp] [<ffffffffa0001ff1>] scsi_dispatch_cmd+0x101/0x280 [scsi_mod] [<ffffffffa0009ad1>] scsi_request_fn+0x411/0x4d0 [scsi_mod] [<ffffffff81223b37>] __blk_run_queue+0x27/0x30 [<ffffffff8122a8d2>] blk_execute_rq_nowait+0x82/0x110 [<ffffffff8122a9c2>] blk_execute_rq+0x62/0xf0 [<ffffffffa000b0e8>] scsi_execute+0xe8/0x190 [scsi_mod] [<ffffffffa000b2f3>] scsi_execute_req+0xa3/0x130 [scsi_mod] [<ffffffffa000c1aa>] scsi_probe_lun+0x17a/0x450 [scsi_mod] [<ffffffffa000ce86>] scsi_probe_and_add_lun+0x156/0x480 [scsi_mod] [<ffffffffa000dc2f>] __scsi_scan_target+0xdf/0x1f0 [scsi_mod] [<ffffffffa000dfa3>] scsi_scan_host_selected+0x183/0x1c0 [scsi_mod] [<ffffffffa000edfb>] scsi_scan+0xdb/0xe0 [scsi_mod] [<ffffffffa000ee13>] store_scan+0x13/0x20 [scsi_mod] [<ffffffff811c8d9b>] sysfs_write_file+0xcb/0x160 [<ffffffff811589de>] vfs_write+0xce/0x140 [<ffffffff81158b53>] sys_write+0x53/0xa0 [<ffffffff81464592>] system_call_fastpath+0x16/0x1b [<00007f611c9d9300>] 0x7f611c9d92ff Reported-by: Max Gurtuvoy <maxg@mellanox.com> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>