Age | Commit message (Collapse) | Author |
|
commit bbe4b3af9d9e3172fb9aa1f8dcdfaedcb381fc64 upstream.
A memory block was allocated in intel_svm_bind_mm() but never freed
in a failure path. This patch fixes this by free it to avoid memory
leakage.
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.4+
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Fixes: 2f26e0a9c9860 ('iommu/vt-d: Add basic SVM PASID support')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
[ Upstream commit 72d548113881dd32bf7f0b221d031e6586468437 ]
It is unlikely request_threaded_irq will fail, but if it does for some
reason we should clear iommu->pr_irq in the error path. Also
intel_svm_finish_prq shouldn't try to clean up the page request
interrupt if pr_irq is 0. Without these, if request_threaded_irq were
to fail the following occurs:
fail with no fixes:
[ 0.683147] ------------[ cut here ]------------
[ 0.683148] NULL pointer, cannot free irq
[ 0.683158] WARNING: CPU: 1 PID: 1 at kernel/irq/irqdomain.c:1632 irq_domain_free_irqs+0x126/0x140
[ 0.683160] Modules linked in:
[ 0.683163] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2 #3
[ 0.683165] Hardware name: /NUC7i3BNB, BIOS BNKBL357.86A.0036.2017.0105.1112 01/05/2017
[ 0.683168] RIP: 0010:irq_domain_free_irqs+0x126/0x140
[ 0.683169] RSP: 0000:ffffc90000037ce8 EFLAGS: 00010292
[ 0.683171] RAX: 000000000000001d RBX: ffff880276283c00 RCX: ffffffff81c5e5e8
[ 0.683172] RDX: 0000000000000001 RSI: 0000000000000096 RDI: 0000000000000246
[ 0.683174] RBP: ffff880276283c00 R08: 0000000000000000 R09: 000000000000023c
[ 0.683175] R10: 0000000000000007 R11: 0000000000000000 R12: 000000000000007a
[ 0.683176] R13: 0000000000000001 R14: 0000000000000000 R15: 0000010010000000
[ 0.683178] FS: 0000000000000000(0000) GS:ffff88027ec80000(0000) knlGS:0000000000000000
[ 0.683180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.683181] CR2: 0000000000000000 CR3: 0000000001c09001 CR4: 00000000003606e0
[ 0.683182] Call Trace:
[ 0.683189] intel_svm_finish_prq+0x3c/0x60
[ 0.683191] free_dmar_iommu+0x1ac/0x1b0
[ 0.683195] init_dmars+0xaaa/0xaea
[ 0.683200] ? klist_next+0x19/0xc0
[ 0.683203] ? pci_do_find_bus+0x50/0x50
[ 0.683205] ? pci_get_dev_by_id+0x52/0x70
[ 0.683208] intel_iommu_init+0x498/0x5c7
[ 0.683211] pci_iommu_init+0x13/0x3c
[ 0.683214] ? e820__memblock_setup+0x61/0x61
[ 0.683217] do_one_initcall+0x4d/0x1a0
[ 0.683220] kernel_init_freeable+0x186/0x20e
[ 0.683222] ? set_debug_rodata+0x11/0x11
[ 0.683225] ? rest_init+0xb0/0xb0
[ 0.683226] kernel_init+0xa/0xff
[ 0.683229] ret_from_fork+0x1f/0x30
[ 0.683259] Code: 89 ee 44 89 e7 e8 3b e8 ff ff 5b 5d 44 89 e7 44 89 ee 41 5c 41 5d 41 5e e9 a8 84 ff ff 48 c7 c7 a8 71 a7 81 31 c0 e8 6a d3 f9 ff <0f> ff 5b 5d 41 5c 41 5d 41 5
e c3 0f 1f 44 00 00 66 2e 0f 1f 84
[ 0.683285] ---[ end trace f7650e42792627ca ]---
with iommu->pr_irq = 0, but no check in intel_svm_finish_prq:
[ 0.669561] ------------[ cut here ]------------
[ 0.669563] Trying to free already-free IRQ 0
[ 0.669573] WARNING: CPU: 3 PID: 1 at kernel/irq/manage.c:1546 __free_irq+0xa4/0x2c0
[ 0.669574] Modules linked in:
[ 0.669577] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2 #4
[ 0.669579] Hardware name: /NUC7i3BNB, BIOS BNKBL357.86A.0036.2017.0105.1112 01/05/2017
[ 0.669581] RIP: 0010:__free_irq+0xa4/0x2c0
[ 0.669582] RSP: 0000:ffffc90000037cc0 EFLAGS: 00010082
[ 0.669584] RAX: 0000000000000021 RBX: 0000000000000000 RCX: ffffffff81c5e5e8
[ 0.669585] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000046
[ 0.669587] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000023c
[ 0.669588] R10: 0000000000000007 R11: 0000000000000000 R12: ffff880276253960
[ 0.669589] R13: ffff8802762538a4 R14: ffff880276253800 R15: ffff880276283600
[ 0.669593] FS: 0000000000000000(0000) GS:ffff88027ed80000(0000) knlGS:0000000000000000
[ 0.669594] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.669596] CR2: 0000000000000000 CR3: 0000000001c09001 CR4: 00000000003606e0
[ 0.669602] Call Trace:
[ 0.669616] free_irq+0x30/0x60
[ 0.669620] intel_svm_finish_prq+0x34/0x60
[ 0.669623] free_dmar_iommu+0x1ac/0x1b0
[ 0.669627] init_dmars+0xaaa/0xaea
[ 0.669631] ? klist_next+0x19/0xc0
[ 0.669634] ? pci_do_find_bus+0x50/0x50
[ 0.669637] ? pci_get_dev_by_id+0x52/0x70
[ 0.669639] intel_iommu_init+0x498/0x5c7
[ 0.669642] pci_iommu_init+0x13/0x3c
[ 0.669645] ? e820__memblock_setup+0x61/0x61
[ 0.669648] do_one_initcall+0x4d/0x1a0
[ 0.669651] kernel_init_freeable+0x186/0x20e
[ 0.669653] ? set_debug_rodata+0x11/0x11
[ 0.669656] ? rest_init+0xb0/0xb0
[ 0.669658] kernel_init+0xa/0xff
[ 0.669661] ret_from_fork+0x1f/0x30
[ 0.669662] Code: 7a 08 75 0e e9 c3 01 00 00 4c 39 7b 08 74 57 48 89 da 48 8b 5a 18 48 85 db 75 ee 89 ee 48 c7 c7 78 67 a7 81 31 c0 e8 4c 37 fa ff <0f> ff 48 8b 34 24 4c 89 ef e
8 0e 4c 68 00 49 8b 46 40 48 8b 80
[ 0.669688] ---[ end trace 58a470248700f2fc ]---
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Somehow I ended up with an off-by-three error in calculating the size of
the PASID and PASID State tables, which triggers allocations failures as
those tables unfortunately have to be physically contiguous.
In fact, even the *correct* maximum size of 8MiB is problematic and is
wont to lead to allocation failures. Since I have extracted a promise
that this *will* be fixed in hardware, I'm happy to limit it on the
current hardware to a maximum of 0x20000 PASIDs, which gives us 1MiB
tables — still not ideal, but better than before.
Reported by Mika Kuoppala <mika.kuoppala@linux.intel.com> and also by
Xunlei Pang <xlpang@redhat.com> who submitted a simpler patch to fix
only the allocation (and not the free) to the "correct" limit... which
was still problematic.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
|
|
We always have vma->vm_mm around.
Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
According to the VT-d specification we need to clear the PPR bit in
the Page Request Status register when handling page requests, or the
hardware won't generate any more interrupts.
This wasn't actually necessary on SKL/KBL (which may well be the
subject of a hardware erratum, although it's harmless enough). But
other implementations do appear to get it right, and we only ever get
one interrupt unless we clear the PPR bit.
Reported-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
|
|
Holding mm_users works OK for graphics, which was the first user of SVM
with VT-d. However, it works less well for other devices, where we actually
do a mmap() from the file descriptor to which the SVM PASID state is tied.
In this case on process exit we end up with a recursive reference count:
- The MM remains alive until the file is closed and the driver's release()
call ends up unbinding the PASID.
- The VMA corresponding to the mmap() remains intact until the MM is
destroyed.
- Thus the file isn't closed, even when exit_files() runs, because the
VMA is still holding a reference to it. And the MM remains alive…
To address this issue, we *stop* holding mm_users while the PASID is bound.
We already hold mm_count by virtue of the MMU notifier, and that can be
made to be sufficient.
It means that for a period during process exit, the fun part of mmput()
has happened and exit_mmap() has been called so the MM is basically
defunct. But the PGD still exists and the PASID is still bound to it.
During this period, we have to be very careful — exit_mmap() doesn't use
mm->mmap_sem because it doesn't expect anyone else to be touching the MM
(quite reasonably, since mm_users is zero). So we also need to fix the
fault handler to just report failure if mm_users is already zero, and to
temporarily bump mm_users while handling any faults.
Additionally, exit_mmap() calls mmu_notifier_release() *before* it tears
down the page tables, which is too early for us to flush the IOTLB for
this PASID. And __mmu_notifier_release() removes every notifier from the
list, so when exit_mmap() finally *does* tear down the mappings and
clear the page tables, we don't get notified. So we work around this by
clearing the PASID table entry in our MMU notifier release() callback.
That way, the hardware *can't* get any pages back from the page tables
before they get cleared.
Hardware designers have confirmed that the resulting 'PASID not present'
faults should be handled just as gracefully as 'page not present' faults,
the important criterion being that they don't perturb the operation for
any *other* PASID in the system.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
|
|
Not doing so is a bug and might trigger a BUG_ON in
handle_mm_fault(). So add the proper permission checks
before calling into mm code.
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-By: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
This is the downside of using bitfields in the struct definition, rather
than doing all the explicit masking and shifting.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Not entirely clear why, but it seems we need to reserve PASID zero and
flush it when we make a PASID entry present.
Quite we we couldn't use the true PASID value, isn't clear.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Change the 'pages' parameter to 'unsigned long' to avoid overflow.
Fix the device-IOTLB flush parameter calculation — the size of the IOTLB
flush is indicated by the position of the least significant zero bit in
the address field. For example, a value of 0x12345f000 will flush from
0x123440000 to 0x12347ffff (256KiB).
Finally, the cap_pgsel_inv() is not relevant to SVM; the spec says that
*all* implementations must support page-selective invaliation for
"first-level" translations. So don't check for it.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
There is an extra semi-colon on this if statement so we always break on
the first iteration.
Fixes: 0204a4960982 ('iommu/vt-d: Add callback to device driver on page faults')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
When flushing kernel-mode PASIDs, we need to flush global pages too.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
This really should be VTD_PAGE_SHIFT, not PAGE_SHIFT. Not that we ever
really anticipate seeing this used on IA64, but we should get it right
anyway.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
The "req->addr" variable is a bit field declared as "u64 addr:52;".
The "address" variable is a u64. We need to cast "req->addr" to a u64
before the shift or the result is truncated to 52 bits.
Fixes: a222a7f0bb6c ('iommu/vt-d: Implement page request handling')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Dan Carpenter pointed out an error path which could lead to us
dereferencing the 'svm' pointer after we know it to be NULL because the
PASID lookup failed. Fix that, and make it less likely to happen again.
Fixes: a222a7f0bb6c ('iommu/vt-d: Implement page request handling')
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
This is only usable for the static 1:1 mapping of physical memory.
Any access to vmalloc or module regions will require some way of doing
an IOTLB flush. It's theoretically possible to hook into the
tlb_flush_kernel_range() function, but that seems like overkill — most
of the addresses accessed through a kernel PASID *will* be in the 1:1
mapping.
If we really need to allow access to more interesting kernel regions,
then the answer will probably be an explicit IOTLB flush call after use,
akin to the DMA API's unmap function.
In fact, it might be worth introducing that sooner rather than later, and
making it just BUG() if the address isn't in the static 1:1 mapping.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Largely based on the driver-mode implementation by Jesse Barnes.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
This provides basic PASID support for endpoint devices, tested with a
version of the i915 driver.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Add CONFIG_INTEL_IOMMU_SVM, and allocate PASID tables on supported hardware.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|