From 373ccbe5927034b55bdc80b0f8b54d6e13fe8d12 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Fri, 11 Dec 2015 13:40:32 -0800 Subject: mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress Tetsuo Handa has reported that the system might basically livelock in OOM condition without triggering the OOM killer. The issue is caused by internal dependency of the direct reclaim on vmstat counter updates (via zone_reclaimable) which are performed from the workqueue context. If all the current workers get assigned to an allocation request, though, they will be looping inside the allocator trying to reclaim memory but zone_reclaimable can see stalled numbers so it will consider a zone reclaimable even though it has been scanned way too much. WQ concurrency logic will not consider this situation as a congested workqueue because it relies that worker would have to sleep in such a situation. This also means that it doesn't try to spawn new workers or invoke the rescuer thread if the one is assigned to the queue. In order to fix this issue we need to do two things. First we have to let wq concurrency code know that we are in trouble so we have to do a short sleep. In order to prevent from issues handled by 0e093d99763e ("writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone") we limit the sleep only to worker threads which are the ones of the interest anyway. The second thing to do is to create a dedicated workqueue for vmstat and mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to have a spare worker thread for it. Signed-off-by: Michal Hocko Reported-by: Tetsuo Handa Cc: Tejun Heo Cc: Cristopher Lameter Cc: Joonsoo Kim Cc: Arkadiusz Miskiewicz Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/backing-dev.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) (limited to 'mm/backing-dev.c') diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 8ed2ffd963c5..7340353f8aea 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -957,8 +957,9 @@ EXPORT_SYMBOL(congestion_wait); * jiffies for either a BDI to exit congestion of the given @sync queue * or a write to complete. * - * In the absence of zone congestion, cond_resched() is called to yield - * the processor if necessary but otherwise does not sleep. + * In the absence of zone congestion, a short sleep or a cond_resched is + * performed to yield the processor and to allow other subsystems to make + * a forward progress. * * The return value is 0 if the sleep is for the full timeout. Otherwise, * it is the number of jiffies that were still remaining when the function @@ -978,7 +979,19 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout) */ if (atomic_read(&nr_wb_congested[sync]) == 0 || !test_bit(ZONE_CONGESTED, &zone->flags)) { - cond_resched(); + + /* + * Memory allocation/reclaim might be called from a WQ + * context and the current implementation of the WQ + * concurrency control doesn't recognize that a particular + * WQ is congested if the worker thread is looping without + * ever sleeping. Therefore we have to do a short sleep + * here rather than calling cond_resched(). + */ + if (current->flags & PF_WQ_WORKER) + schedule_timeout(1); + else + cond_resched(); /* In case we scheduled, work out time remaining */ ret = timeout - (jiffies - start); -- cgit v1.2.3 From 7d828602e5ef3297a69392a2d31264e4ab9c8bb7 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Thu, 14 Jan 2016 15:20:56 -0800 Subject: mm: memcontrol: export root_mem_cgroup A later patch will need this symbol in files other than memcontrol.c, so export it now and replace mem_cgroup_root_css at the same time. Signed-off-by: Johannes Weiner Acked-by: Michal Hocko Acked-by: David S. Miller Reviewed-by: Vladimir Davydov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/backing-dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'mm/backing-dev.c') diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 7340353f8aea..cc5d29d2da9b 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -672,7 +672,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi) ret = wb_init(&bdi->wb, bdi, 1, GFP_KERNEL); if (!ret) { - bdi->wb.memcg_css = mem_cgroup_root_css; + bdi->wb.memcg_css = &root_mem_cgroup->css; bdi->wb.blkcg_css = blkcg_root_css; } return ret; -- cgit v1.2.3