From 943dca1a1fcbccb58de944669b833fd38a6c809b Mon Sep 17 00:00:00 2001 From: Yasuaki Ishimatsu Date: Tue, 21 Jan 2014 15:49:06 -0800 Subject: mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve Yasuaki Ishimatsu reported memory hot-add spent more than 5 _hours_ on 9TB memory machine since onlining memory sections is too slow. And we found out setup_zone_migrate_reserve spent >90% of the time. The problem is, setup_zone_migrate_reserve scans all pageblocks unconditionally, but it is only necessary if the number of reserved block was reduced (i.e. memory hot remove). Moreover, maximum MIGRATE_RESERVE per zone is currently 2. It means that the number of reserved pageblocks is almost always unchanged. This patch adds zone->nr_migrate_reserve_block to maintain the number of MIGRATE_RESERVE pageblocks and it reduces the overhead of setup_zone_migrate_reserve dramatically. The following table shows time of onlining a memory section. Amount of memory | 128GB | 192GB | 256GB| --------------------------------------------- linux-3.12 | 23.9 | 31.4 | 44.5 | This patch | 8.3 | 8.3 | 8.6 | Mel's proposal patch | 10.9 | 19.2 | 31.3 | --------------------------------------------- (millisecond) 128GB : 4 nodes and each node has 32GB of memory 192GB : 6 nodes and each node has 32GB of memory 256GB : 8 nodes and each node has 32GB of memory (*1) Mel proposed his idea by the following threads. https://lkml.org/lkml/2013/10/30/272 [akpm@linux-foundation.org: tweak comment] Signed-off-by: KOSAKI Motohiro Signed-off-by: Yasuaki Ishimatsu Reported-by: Yasuaki Ishimatsu Tested-by: Yasuaki Ishimatsu Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5248fe070aa4..89d81f4429ca 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3901,6 +3901,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) struct page *page; unsigned long block_migratetype; int reserve; + int old_reserve; /* * Get the start pfn, end pfn and the number of blocks to reserve @@ -3922,6 +3923,12 @@ static void setup_zone_migrate_reserve(struct zone *zone) * future allocation of hugepages at runtime. */ reserve = min(2, reserve); + old_reserve = zone->nr_migrate_reserve_block; + + /* When memory hot-add, we almost always need to do nothing */ + if (reserve == old_reserve) + return; + zone->nr_migrate_reserve_block = reserve; for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { if (!pfn_valid(pfn)) @@ -3959,6 +3966,12 @@ static void setup_zone_migrate_reserve(struct zone *zone) reserve--; continue; } + } else if (!old_reserve) { + /* + * At boot time we don't need to scan the whole zone + * for turning off MIGRATE_RESERVE. + */ + break; } /* -- cgit v1.2.3 From aec6a8889a98a0cd58357cd0937a25189908f191 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Tue, 21 Jan 2014 15:49:13 -0800 Subject: mm, show_mem: remove SHOW_MEM_FILTER_PAGE_COUNT Commit 4b59e6c47309 ("mm, show_mem: suppress page counts in non-blockable contexts") introduced SHOW_MEM_FILTER_PAGE_COUNT to suppress PFN walks on large memory machines. Commit c78e93630d15 ("mm: do not walk all of system memory during show_mem") avoided a PFN walk in the generic show_mem helper which removes the requirement for SHOW_MEM_FILTER_PAGE_COUNT in that case. This patch removes PFN walkers from the arch-specific implementations that report on a per-node or per-zone granularity. ARM and unicore32 still do a PFN walk as they report memory usage on each bank which is a much finer granularity where the debugging information may still be of use. As the remaining arches doing PFN walks have relatively small amounts of memory, this patch simply removes SHOW_MEM_FILTER_PAGE_COUNT. [akpm@linux-foundation.org: fix parisc] Signed-off-by: Mel Gorman Acked-by: David Rientjes Cc: Tony Luck Cc: Russell King Cc: James Bottomley Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 7 ------- 1 file changed, 7 deletions(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 89d81f4429ca..ec4417cb458a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2071,13 +2071,6 @@ void warn_alloc_failed(gfp_t gfp_mask, int order, const char *fmt, ...) debug_guardpage_minorder() > 0) return; - /* - * Walking all memory to count page types is very expensive and should - * be inhibited in non-blockable contexts. - */ - if (!(gfp_mask & __GFP_WAIT)) - filter |= SHOW_MEM_FILTER_PAGE_COUNT; - /* * This documents exceptions given to allocations in certain * contexts that are allowed to allocate outside current's set -- cgit v1.2.3 From b2f3eebe7a8ef6cd4e2ea088ac7f613793f6cad6 Mon Sep 17 00:00:00 2001 From: Tang Chen Date: Tue, 21 Jan 2014 15:49:38 -0800 Subject: x86, numa, acpi, memory-hotplug: make movable_node have higher priority If users specify the original movablecore=nn@ss boot option, the kernel will arrange [ss, ss+nn) as ZONE_MOVABLE. The kernelcore=nn@ss boot option is similar except it specifies ZONE_NORMAL ranges. Now, if users specify "movable_node" in kernel commandline, the kernel will arrange hotpluggable memory in SRAT as ZONE_MOVABLE. And if users do this, all the other movablecore=nn@ss and kernelcore=nn@ss options should be ignored. For those who don't want this, just specify nothing. The kernel will act as before. Signed-off-by: Tang Chen Signed-off-by: Zhang Yanfei Reviewed-by: Wanpeng Li Cc: "H. Peter Anvin" Cc: "Rafael J . Wysocki" Cc: Chen Tang Cc: Gong Chen Cc: Ingo Molnar Cc: Jiang Liu Cc: Johannes Weiner Cc: Lai Jiangshan Cc: Larry Woodman Cc: Len Brown Cc: Liu Jiang Cc: Mel Gorman Cc: Michal Nazarewicz Cc: Minchan Kim Cc: Prarit Bhargava Cc: Rik van Riel Cc: Taku Izumi Cc: Tejun Heo Cc: Thomas Gleixner Cc: Thomas Renninger Cc: Toshi Kani Cc: Vasilis Liaskovitis Cc: Wen Congyang Cc: Yasuaki Ishimatsu Cc: Yinghai Lu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ec4417cb458a..4f59d1986018 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5018,9 +5018,33 @@ static void __init find_zone_movable_pfns_for_nodes(void) nodemask_t saved_node_state = node_states[N_MEMORY]; unsigned long totalpages = early_calculate_totalpages(); int usable_nodes = nodes_weight(node_states[N_MEMORY]); + struct memblock_type *type = &memblock.memory; + + /* Need to find movable_zone earlier when movable_node is specified. */ + find_usable_zone_for_movable(); + + /* + * If movable_node is specified, ignore kernelcore and movablecore + * options. + */ + if (movable_node_is_enabled()) { + for (i = 0; i < type->cnt; i++) { + if (!memblock_is_hotpluggable(&type->regions[i])) + continue; + + nid = type->regions[i].nid; + + usable_startpfn = PFN_DOWN(type->regions[i].base); + zone_movable_pfn[nid] = zone_movable_pfn[nid] ? + min(usable_startpfn, zone_movable_pfn[nid]) : + usable_startpfn; + } + + goto out2; + } /* - * If movablecore was specified, calculate what size of + * If movablecore=nn[KMG] was specified, calculate what size of * kernelcore that corresponds so that memory usable for * any allocation type is evenly spread. If both kernelcore * and movablecore are specified, then the value of kernelcore @@ -5046,7 +5070,6 @@ static void __init find_zone_movable_pfns_for_nodes(void) goto out; /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ - find_usable_zone_for_movable(); usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; restart: @@ -5137,6 +5160,7 @@ restart: if (usable_nodes && required_kernelcore > usable_nodes) goto restart; +out2: /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ for (nid = 0; nid < MAX_NUMNODES; nid++) zone_movable_pfn[nid] = -- cgit v1.2.3 From 6782832eba5e8c87a749a41da8deda1c3ef67ba0 Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar Date: Tue, 21 Jan 2014 15:50:25 -0800 Subject: mm/page_alloc.c: use memblock apis for early memory allocations Switch to memblock interfaces for early memory allocator instead of bootmem allocator. No functional change in beahvior than what it is in current code from bootmem users points of view. Archs already converted to NO_BOOTMEM now directly use memblock interfaces instead of bootmem wrappers build on top of memblock. And the archs which still uses bootmem, these new apis just fallback to exiting bootmem APIs. Signed-off-by: Grygorii Strashko Signed-off-by: Santosh Shilimkar Cc: Yinghai Lu Cc: Tejun Heo Cc: "Rafael J. Wysocki" Cc: Arnd Bergmann Cc: Christoph Lameter Cc: Greg Kroah-Hartman Cc: H. Peter Anvin Cc: Johannes Weiner Cc: KAMEZAWA Hiroyuki Cc: Konrad Rzeszutek Wilk Cc: Michal Hocko Cc: Paul Walmsley Cc: Pavel Machek Cc: Russell King Cc: Tony Lindgren Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4f59d1986018..b230e838883d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4215,7 +4215,6 @@ static noinline __init_refok int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages) { int i; - struct pglist_data *pgdat = zone->zone_pgdat; size_t alloc_size; /* @@ -4231,7 +4230,8 @@ int zone_wait_table_init(struct zone *zone, unsigned long zone_size_pages) if (!slab_is_available()) { zone->wait_table = (wait_queue_head_t *) - alloc_bootmem_node_nopanic(pgdat, alloc_size); + memblock_virt_alloc_node_nopanic( + alloc_size, zone->zone_pgdat->node_id); } else { /* * This case means that a zone whose size was 0 gets new memory @@ -4351,13 +4351,14 @@ bool __meminit early_pfn_in_nid(unsigned long pfn, int node) #endif /** - * free_bootmem_with_active_regions - Call free_bootmem_node for each active range + * free_bootmem_with_active_regions - Call memblock_free_early_nid for each active range * @nid: The node to free memory on. If MAX_NUMNODES, all nodes are freed. - * @max_low_pfn: The highest PFN that will be passed to free_bootmem_node + * @max_low_pfn: The highest PFN that will be passed to memblock_free_early_nid * * If an architecture guarantees that all ranges registered with * add_active_ranges() contain no holes and may be freed, this - * this function may be used instead of calling free_bootmem() manually. + * this function may be used instead of calling memblock_free_early_nid() + * manually. */ void __init free_bootmem_with_active_regions(int nid, unsigned long max_low_pfn) { @@ -4369,9 +4370,9 @@ void __init free_bootmem_with_active_regions(int nid, unsigned long max_low_pfn) end_pfn = min(end_pfn, max_low_pfn); if (start_pfn < end_pfn) - free_bootmem_node(NODE_DATA(this_nid), - PFN_PHYS(start_pfn), - (end_pfn - start_pfn) << PAGE_SHIFT); + memblock_free_early_nid(PFN_PHYS(start_pfn), + (end_pfn - start_pfn) << PAGE_SHIFT, + this_nid); } } @@ -4642,8 +4643,9 @@ static void __init setup_usemap(struct pglist_data *pgdat, unsigned long usemapsize = usemap_size(zone_start_pfn, zonesize); zone->pageblock_flags = NULL; if (usemapsize) - zone->pageblock_flags = alloc_bootmem_node_nopanic(pgdat, - usemapsize); + zone->pageblock_flags = + memblock_virt_alloc_node_nopanic(usemapsize, + pgdat->node_id); } #else static inline void setup_usemap(struct pglist_data *pgdat, struct zone *zone, @@ -4837,7 +4839,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat) size = (end - start) * sizeof(struct page); map = alloc_remap(pgdat->node_id, size); if (!map) - map = alloc_bootmem_node_nopanic(pgdat, size); + map = memblock_virt_alloc_node_nopanic(size, + pgdat->node_id); pgdat->node_mem_map = map + (pgdat->node_start_pfn - start); } #ifndef CONFIG_NEED_MULTIPLE_NODES @@ -5887,7 +5890,7 @@ void *__init alloc_large_system_hash(const char *tablename, do { size = bucketsize << log2qty; if (flags & HASH_EARLY) - table = alloc_bootmem_nopanic(size); + table = memblock_virt_alloc_nopanic(size, 0); else if (hashdist) table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL); else { -- cgit v1.2.3 From de6c60a6c115acaa721cfd499e028a413d1fcbf3 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Tue, 21 Jan 2014 15:51:07 -0800 Subject: mm: compaction: encapsulate defer reset logic Currently there are several functions to manipulate the deferred compaction state variables. The remaining case where the variables are touched directly is when a successful allocation occurs in direct compaction, or is expected to be successful in the future by kswapd. Here, the lowest order that is expected to fail is updated, and in the case of successful allocation, the deferred status and counter is reset completely. Create a new function compaction_defer_reset() to encapsulate this functionality and make it easier to understand the code. No functional change. Signed-off-by: Vlastimil Babka Acked-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b230e838883d..84da0e3bc886 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2235,10 +2235,7 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, preferred_zone, migratetype); if (page) { preferred_zone->compact_blockskip_flush = false; - preferred_zone->compact_considered = 0; - preferred_zone->compact_defer_shift = 0; - if (order >= preferred_zone->compact_order_failed) - preferred_zone->compact_order_failed = order + 1; + compaction_defer_reset(preferred_zone, order, true); count_vm_event(COMPACTSUCCESS); return page; } -- cgit v1.2.3 From aed0a0e32de387da831284fda25021de32477195 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Tue, 21 Jan 2014 15:51:12 -0800 Subject: mm, page_alloc: warn for non-blockable __GFP_NOFAIL allocation failure __GFP_NOFAIL may return NULL when coupled with GFP_NOWAIT or GFP_ATOMIC. Luckily, nothing currently does such craziness. So instead of causing such allocations to loop (potentially forever), we maintain the current behavior and also warn about the new users of the deprecated flag. Suggested-by: Andrew Morton Signed-off-by: David Rientjes Cc: Mel Gorman Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 84da0e3bc886..533e2147d14f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2525,8 +2525,15 @@ rebalance: } /* Atomic allocations - we can't balance anything */ - if (!wait) + if (!wait) { + /* + * All existing users of the deprecated __GFP_NOFAIL are + * blockable, so warn of any new users that actually allow this + * type of allocation to fail. + */ + WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL); goto nopage; + } /* Avoid recursion of direct reclaim */ if (current->flags & PF_MEMALLOC) -- cgit v1.2.3 From f0b791a34cb3cffd2bbc3ca4365c9b719fa2c9f3 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Thu, 23 Jan 2014 15:52:49 -0800 Subject: mm: print more details for bad_page() bad_page() is cool in that it prints out a bunch of data about the page. But, I can never remember which page flags are good and which are bad, or whether ->index or ->mapping is required to be NULL. This patch allows bad/dump_page() callers to specify a string about why they are dumping the page and adds explanation strings to a number of places. It also adds a 'bad_flags' argument to bad_page(), which it then dumps out separately from the flags which are actually set. This way, the messages will show specifically why the page was bad, *specifically* which flags it is complaining about, if it was a page flag combination which was the problem. [akpm@linux-foundation.org: switch to pr_alert] Signed-off-by: Dave Hansen Reviewed-by: Christoph Lameter Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 72 ++++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 54 insertions(+), 18 deletions(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 533e2147d14f..1939f4446a36 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -295,7 +295,7 @@ static inline int bad_range(struct zone *zone, struct page *page) } #endif -static void bad_page(struct page *page) +static void bad_page(struct page *page, char *reason, unsigned long bad_flags) { static unsigned long resume; static unsigned long nr_shown; @@ -329,7 +329,7 @@ static void bad_page(struct page *page) printk(KERN_ALERT "BUG: Bad page state in process %s pfn:%05lx\n", current->comm, page_to_pfn(page)); - dump_page(page); + dump_page_badflags(page, reason, bad_flags); print_modules(); dump_stack(); @@ -383,7 +383,7 @@ static int destroy_compound_page(struct page *page, unsigned long order) int bad = 0; if (unlikely(compound_order(page) != order)) { - bad_page(page); + bad_page(page, "wrong compound order", 0); bad++; } @@ -392,8 +392,11 @@ static int destroy_compound_page(struct page *page, unsigned long order) for (i = 1; i < nr_pages; i++) { struct page *p = page + i; - if (unlikely(!PageTail(p) || (p->first_page != page))) { - bad_page(page); + if (unlikely(!PageTail(p))) { + bad_page(page, "PageTail not set", 0); + bad++; + } else if (unlikely(p->first_page != page)) { + bad_page(page, "first_page not consistent", 0); bad++; } __ClearPageTail(p); @@ -618,12 +621,23 @@ out: static inline int free_pages_check(struct page *page) { - if (unlikely(page_mapcount(page) | - (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | - (page->flags & PAGE_FLAGS_CHECK_AT_FREE) | - (mem_cgroup_bad_page_check(page)))) { - bad_page(page); + char *bad_reason = NULL; + unsigned long bad_flags = 0; + + if (unlikely(page_mapcount(page))) + bad_reason = "nonzero mapcount"; + if (unlikely(page->mapping != NULL)) + bad_reason = "non-NULL mapping"; + if (unlikely(atomic_read(&page->_count) != 0)) + bad_reason = "nonzero _count"; + if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_FREE)) { + bad_reason = "PAGE_FLAGS_CHECK_AT_FREE flag(s) set"; + bad_flags = PAGE_FLAGS_CHECK_AT_FREE; + } + if (unlikely(mem_cgroup_bad_page_check(page))) + bad_reason = "cgroup check failed"; + if (unlikely(bad_reason)) { + bad_page(page, bad_reason, bad_flags); return 1; } page_cpupid_reset_last(page); @@ -843,12 +857,23 @@ static inline void expand(struct zone *zone, struct page *page, */ static inline int check_new_page(struct page *page) { - if (unlikely(page_mapcount(page) | - (page->mapping != NULL) | - (atomic_read(&page->_count) != 0) | - (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | - (mem_cgroup_bad_page_check(page)))) { - bad_page(page); + char *bad_reason = NULL; + unsigned long bad_flags = 0; + + if (unlikely(page_mapcount(page))) + bad_reason = "nonzero mapcount"; + if (unlikely(page->mapping != NULL)) + bad_reason = "non-NULL mapping"; + if (unlikely(atomic_read(&page->_count) != 0)) + bad_reason = "nonzero _count"; + if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { + bad_reason = "PAGE_FLAGS_CHECK_AT_PREP flag set"; + bad_flags = PAGE_FLAGS_CHECK_AT_PREP; + } + if (unlikely(mem_cgroup_bad_page_check(page))) + bad_reason = "cgroup check failed"; + if (unlikely(bad_reason)) { + bad_page(page, bad_reason, bad_flags); return 1; } return 0; @@ -6494,12 +6519,23 @@ static void dump_page_flags(unsigned long flags) printk(")\n"); } -void dump_page(struct page *page) +void dump_page_badflags(struct page *page, char *reason, unsigned long badflags) { printk(KERN_ALERT "page:%p count:%d mapcount:%d mapping:%p index:%#lx\n", page, atomic_read(&page->_count), page_mapcount(page), page->mapping, page->index); dump_page_flags(page->flags); + if (reason) + pr_alert("page dumped because: %s\n", reason); + if (page->flags & badflags) { + pr_alert("bad because of flags:\n"); + dump_page_flags(page->flags & badflags); + } mem_cgroup_print_bad_page(page); } + +void dump_page(struct page *page, char *reason) +{ + dump_page_badflags(page, reason, 0); +} -- cgit v1.2.3 From 309381feaee564281c3d9e90fbca8963bb7428ad Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Thu, 23 Jan 2014 15:52:54 -0800 Subject: mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE Most of the VM_BUG_ON assertions are performed on a page. Usually, when one of these assertions fails we'll get a BUG_ON with a call stack and the registers. I've recently noticed based on the requests to add a small piece of code that dumps the page to various VM_BUG_ON sites that the page dump is quite useful to people debugging issues in mm. This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what VM_BUG_ON() does, also dumps the page before executing the actual BUG_ON. [akpm@linux-foundation.org: fix up includes] Signed-off-by: Sasha Levin Cc: "Kirill A. Shutemov" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1939f4446a36..f18f016cca80 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -509,12 +509,12 @@ static inline int page_is_buddy(struct page *page, struct page *buddy, return 0; if (page_is_guard(buddy) && page_order(buddy) == order) { - VM_BUG_ON(page_count(buddy) != 0); + VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy); return 1; } if (PageBuddy(buddy) && page_order(buddy) == order) { - VM_BUG_ON(page_count(buddy) != 0); + VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy); return 1; } return 0; @@ -564,8 +564,8 @@ static inline void __free_one_page(struct page *page, page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1); - VM_BUG_ON(page_idx & ((1 << order) - 1)); - VM_BUG_ON(bad_range(zone, page)); + VM_BUG_ON_PAGE(page_idx & ((1 << order) - 1), page); + VM_BUG_ON_PAGE(bad_range(zone, page), page); while (order < MAX_ORDER-1) { buddy_idx = __find_buddy_index(page_idx, order); @@ -827,7 +827,7 @@ static inline void expand(struct zone *zone, struct page *page, area--; high--; size >>= 1; - VM_BUG_ON(bad_range(zone, &page[size])); + VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); #ifdef CONFIG_DEBUG_PAGEALLOC if (high < debug_guardpage_minorder()) { @@ -980,7 +980,7 @@ int move_freepages(struct zone *zone, for (page = start_page; page <= end_page;) { /* Make sure we are not inadvertently changing nodes */ - VM_BUG_ON(page_to_nid(page) != zone_to_nid(zone)); + VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page); if (!pfn_valid_within(page_to_pfn(page))) { page++; @@ -1429,8 +1429,8 @@ void split_page(struct page *page, unsigned int order) { int i; - VM_BUG_ON(PageCompound(page)); - VM_BUG_ON(!page_count(page)); + VM_BUG_ON_PAGE(PageCompound(page), page); + VM_BUG_ON_PAGE(!page_count(page), page); #ifdef CONFIG_KMEMCHECK /* @@ -1577,7 +1577,7 @@ again: zone_statistics(preferred_zone, zone, gfp_flags); local_irq_restore(flags); - VM_BUG_ON(bad_range(zone, page)); + VM_BUG_ON_PAGE(bad_range(zone, page), page); if (prep_new_page(page, order, gfp_flags)) goto again; return page; @@ -6021,7 +6021,7 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags, pfn = page_to_pfn(page); bitmap = get_pageblock_bitmap(zone, pfn); bitidx = pfn_to_bitidx(zone, pfn); - VM_BUG_ON(!zone_spans_pfn(zone, pfn)); + VM_BUG_ON_PAGE(!zone_spans_pfn(zone, pfn), page); for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1) if (flags & value) @@ -6539,3 +6539,4 @@ void dump_page(struct page *page, char *reason) { dump_page_badflags(page, reason, 0); } +EXPORT_SYMBOL_GPL(dump_page); -- cgit v1.2.3 From da8c757b080ee84f219fa2368cb5dd23ac304fc0 Mon Sep 17 00:00:00 2001 From: Han Pingtian Date: Thu, 23 Jan 2014 15:53:17 -0800 Subject: mm: prevent setting of a value less than 0 to min_free_kbytes If echo -1 > /proc/vm/sys/min_free_kbytes, the system will hang. Changing proc_dointvec() to proc_dointvec_minmax() in the min_free_kbytes_sysctl_handler() can prevent this to happen. mhocko said: : You can still do echo $BIG_VALUE > /proc/vm/sys/min_free_kbytes and make : your machine unusable but I agree that proc_dointvec_minmax is more : suitable here as we already have: : : .proc_handler = min_free_kbytes_sysctl_handler, : .extra1 = &zero, : : It used to work properly but then 6fce56ec91b5 ("sysctl: Remove references : to ctl_name and strategy from the generic sysctl table") has removed : sysctl_intvec strategy and so extra1 is ignored. Signed-off-by: Han Pingtian Acked-by: Michal Hocko Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f18f016cca80..a818d568ddf3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5754,7 +5754,12 @@ module_init(init_per_zone_wmark_min) int min_free_kbytes_sysctl_handler(ctl_table *table, int write, void __user *buffer, size_t *length, loff_t *ppos) { - proc_dointvec(table, write, buffer, length, ppos); + int rc; + + rc = proc_dointvec_minmax(table, write, buffer, length, ppos); + if (rc) + return rc; + if (write) { user_min_free_kbytes = min_free_kbytes; setup_per_zone_wmarks(); -- cgit v1.2.3 From 42aa83cb6757800f4e2b499f5db3127761517a6a Mon Sep 17 00:00:00 2001 From: Han Pingtian Date: Thu, 23 Jan 2014 15:53:28 -0800 Subject: mm: show message when updating min_free_kbytes in thp min_free_kbytes may be raised during THP's initialization. Sometimes, this will change the value which was set by the user. Showing this message will clarify this confusion. Only show this message when changing a value which was set by the user according to Michal Hocko's suggestion. Show the old value of min_free_kbytes according to Dave Hansen's suggestion. This will give user the chance to restore old value of min_free_kbytes. Signed-off-by: Han Pingtian Reviewed-by: Michal Hocko Cc: David Rientjes Cc: Mel Gorman Cc: Dave Hansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/page_alloc.c') diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a818d568ddf3..e3758a09a009 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -205,7 +205,7 @@ static char * const zone_names[MAX_NR_ZONES] = { }; int min_free_kbytes = 1024; -int user_min_free_kbytes; +int user_min_free_kbytes = -1; static unsigned long __meminitdata nr_kernel_pages; static unsigned long __meminitdata nr_all_pages; -- cgit v1.2.3